Re: [RFC] dropping support for ancient versions of curl

2017-04-03 Thread Jessie Hernandez
> On Mon, Apr 03, 2017 at 10:54:38PM -0400, Jeff King wrote:
>
>> If we declared 7.16.0 as a cutoff, we could unconditionally define
>> USE_CURL_MULTI, which gets rid of quite a few messy ifdefs.
>
> That version came out 11 years ago. Here's what that patch would look
> like (on top of my other one, but note I missed one older ifdef in the
> last one that's included here).
>
> And we could go further. There are a number of ifdefs around 7.19.1 (Nov
> 2008), 7.22 (Sep 2011). A 5-year cutoff puts us at 7.24. That's getting
> close enough that I'd probably start looking at what long-term distros
> like RHEL are still shipping and supporting.

Hi Peff,

I am all for making code simpler and dropping ancient versions.
I think we have to look at about the 7.19.1 mark.

I can confirm on my Centos 6.8 machine and on RHEL 6.8 on the client side
curl 7.19.7 [1] is used. I do not know what other distros are using.


Jessie

>
> -Peff
>
> ---
>  Documentation/config.txt |  3 +-
>  http-push.c  | 23 --
>  http-walker.c| 12 
>  http.c   | 57 +--
>  http.h   | 18 ---
>  remote-curl.c|  4 ---
>  6 files changed, 2 insertions(+), 115 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 475e874d5..2b04c1777 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1942,8 +1942,7 @@ http.maxRequests::
>  http.minSessions::
>   The number of curl sessions (counted across slots) to be kept across
>   requests. They will not be ended with curl_easy_cleanup() until
> - http_cleanup() is invoked. If USE_CURL_MULTI is not defined, this
> - value will be capped at 1. Defaults to 1.
> + http_cleanup() is invoked. Defaults to 1.
>
>  http.postBuffer::
>   Maximum size in bytes of the buffer used by smart HTTP
> diff --git a/http-push.c b/http-push.c
> index f0e3108f7..40146cdd6 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -198,10 +198,8 @@ static void curl_setup_http(CURL *curl, const char
> *url,
>   curl_easy_setopt(curl, CURLOPT_INFILE, buffer);
>   curl_easy_setopt(curl, CURLOPT_INFILESIZE, buffer->buf.len);
>   curl_easy_setopt(curl, CURLOPT_READFUNCTION, fread_buffer);
> -#ifndef NO_CURL_IOCTL
>   curl_easy_setopt(curl, CURLOPT_IOCTLFUNCTION, ioctl_buffer);
>   curl_easy_setopt(curl, CURLOPT_IOCTLDATA, buffer);
> -#endif
>   curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_fn);
>   curl_easy_setopt(curl, CURLOPT_NOBODY, 0);
>   curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, custom_req);
> @@ -244,8 +242,6 @@ static void process_response(void *callback_data)
>   finish_request(request);
>  }
>
> -#ifdef USE_CURL_MULTI
> -
>  static void start_fetch_loose(struct transfer_request *request)
>  {
>   struct active_request_slot *slot;
> @@ -295,7 +291,6 @@ static void start_mkcol(struct transfer_request
> *request)
>   request->url = NULL;
>   }
>  }
> -#endif
>
>  static void start_fetch_packed(struct transfer_request *request)
>  {
> @@ -600,7 +595,6 @@ static void finish_request(struct transfer_request
> *request)
>   }
>  }
>
> -#ifdef USE_CURL_MULTI
>  static int is_running_queue;
>  static int fill_active_slot(void *unused)
>  {
> @@ -624,7 +618,6 @@ static int fill_active_slot(void *unused)
>   }
>   return 0;
>  }
> -#endif
>
>  static void get_remote_object_list(unsigned char parent);
>
> @@ -653,10 +646,8 @@ static void add_fetch_request(struct object *obj)
>   request->next = request_queue_head;
>   request_queue_head = request;
>
> -#ifdef USE_CURL_MULTI
>   fill_active_slots();
>   step_active_slots();
> -#endif
>  }
>
>  static int add_send_request(struct object *obj, struct remote_lock *lock)
> @@ -691,10 +682,8 @@ static int add_send_request(struct object *obj,
> struct remote_lock *lock)
>   request->next = request_queue_head;
>   request_queue_head = request;
>
> -#ifdef USE_CURL_MULTI
>   fill_active_slots();
>   step_active_slots();
> -#endif
>
>   return 1;
>  }
> @@ -1673,21 +1662,15 @@ static int delete_remote_branch(const char
> *pattern, int force)
>
>  static void run_request_queue(void)
>  {
> -#ifdef USE_CURL_MULTI
>   is_running_queue = 1;
>   fill_active_slots();
>   add_fill_function(NULL, fill_active_slot);
> -#endif
>   do {
>   finish_all_active_slots();
> -#ifdef USE_CURL_MULTI
>   fill_active_slots();
> -#endif
>   } while (request_queue_head && !aborted);
>
> -#ifdef USE_CURL_MULTI
>   is_running_queue = 0;
> -#endif
>  }
>
>  int cmd_main(int argc, const char **argv)
> @@ -1763,10 +1746,6 @@ int cmd_main(int argc, const char **argv)
>   break;
>   }
>
> -#ifndef USE_CURL_MULTI
> - die("git-push is not available for http/https repository when not
> compiled with USE_CURL_MULTI");
> -#endif

Re: [RFC] dropping support for ancient versions of curl

2017-04-03 Thread Jeff King
On Mon, Apr 03, 2017 at 10:54:38PM -0400, Jeff King wrote:

> If we declared 7.16.0 as a cutoff, we could unconditionally define
> USE_CURL_MULTI, which gets rid of quite a few messy ifdefs.

That version came out 11 years ago. Here's what that patch would look
like (on top of my other one, but note I missed one older ifdef in the
last one that's included here).

And we could go further. There are a number of ifdefs around 7.19.1 (Nov
2008), 7.22 (Sep 2011). A 5-year cutoff puts us at 7.24. That's getting
close enough that I'd probably start looking at what long-term distros
like RHEL are still shipping and supporting.

-Peff

---
 Documentation/config.txt |  3 +-
 http-push.c  | 23 --
 http-walker.c| 12 
 http.c   | 57 +--
 http.h   | 18 ---
 remote-curl.c|  4 ---
 6 files changed, 2 insertions(+), 115 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d5..2b04c1777 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1942,8 +1942,7 @@ http.maxRequests::
 http.minSessions::
The number of curl sessions (counted across slots) to be kept across
requests. They will not be ended with curl_easy_cleanup() until
-   http_cleanup() is invoked. If USE_CURL_MULTI is not defined, this
-   value will be capped at 1. Defaults to 1.
+   http_cleanup() is invoked. Defaults to 1.
 
 http.postBuffer::
Maximum size in bytes of the buffer used by smart HTTP
diff --git a/http-push.c b/http-push.c
index f0e3108f7..40146cdd6 100644
--- a/http-push.c
+++ b/http-push.c
@@ -198,10 +198,8 @@ static void curl_setup_http(CURL *curl, const char *url,
curl_easy_setopt(curl, CURLOPT_INFILE, buffer);
curl_easy_setopt(curl, CURLOPT_INFILESIZE, buffer->buf.len);
curl_easy_setopt(curl, CURLOPT_READFUNCTION, fread_buffer);
-#ifndef NO_CURL_IOCTL
curl_easy_setopt(curl, CURLOPT_IOCTLFUNCTION, ioctl_buffer);
curl_easy_setopt(curl, CURLOPT_IOCTLDATA, buffer);
-#endif
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_fn);
curl_easy_setopt(curl, CURLOPT_NOBODY, 0);
curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, custom_req);
@@ -244,8 +242,6 @@ static void process_response(void *callback_data)
finish_request(request);
 }
 
-#ifdef USE_CURL_MULTI
-
 static void start_fetch_loose(struct transfer_request *request)
 {
struct active_request_slot *slot;
@@ -295,7 +291,6 @@ static void start_mkcol(struct transfer_request *request)
request->url = NULL;
}
 }
-#endif
 
 static void start_fetch_packed(struct transfer_request *request)
 {
@@ -600,7 +595,6 @@ static void finish_request(struct transfer_request *request)
}
 }
 
-#ifdef USE_CURL_MULTI
 static int is_running_queue;
 static int fill_active_slot(void *unused)
 {
@@ -624,7 +618,6 @@ static int fill_active_slot(void *unused)
}
return 0;
 }
-#endif
 
 static void get_remote_object_list(unsigned char parent);
 
@@ -653,10 +646,8 @@ static void add_fetch_request(struct object *obj)
request->next = request_queue_head;
request_queue_head = request;
 
-#ifdef USE_CURL_MULTI
fill_active_slots();
step_active_slots();
-#endif
 }
 
 static int add_send_request(struct object *obj, struct remote_lock *lock)
@@ -691,10 +682,8 @@ static int add_send_request(struct object *obj, struct 
remote_lock *lock)
request->next = request_queue_head;
request_queue_head = request;
 
-#ifdef USE_CURL_MULTI
fill_active_slots();
step_active_slots();
-#endif
 
return 1;
 }
@@ -1673,21 +1662,15 @@ static int delete_remote_branch(const char *pattern, 
int force)
 
 static void run_request_queue(void)
 {
-#ifdef USE_CURL_MULTI
is_running_queue = 1;
fill_active_slots();
add_fill_function(NULL, fill_active_slot);
-#endif
do {
finish_all_active_slots();
-#ifdef USE_CURL_MULTI
fill_active_slots();
-#endif
} while (request_queue_head && !aborted);
 
-#ifdef USE_CURL_MULTI
is_running_queue = 0;
-#endif
 }
 
 int cmd_main(int argc, const char **argv)
@@ -1763,10 +1746,6 @@ int cmd_main(int argc, const char **argv)
break;
}
 
-#ifndef USE_CURL_MULTI
-   die("git-push is not available for http/https repository when not 
compiled with USE_CURL_MULTI");
-#endif
-
if (!repo->url)
usage(http_push_usage);
 
@@ -1779,9 +1758,7 @@ int cmd_main(int argc, const char **argv)
 
http_init(NULL, repo->url, 1);
 
-#ifdef USE_CURL_MULTI
is_running_queue = 0;
-#endif
 
/* Verify DAV compliance/lock support */
if (!locking_available()) {
diff --git a/http-walker.c b/http-walker.c
index ee049cb13..b5b8e03b0 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -119,7 +119,6 @@ static 

[RFC] dropping support for ancient versions of curl

2017-04-03 Thread Jeff King
A nearby thread raised the question of whether we can rely on a version
of libcurl that contains a particular feature. The version in question
is curl 7.11.1, which came out in March 2004.

My feeling is that this is old enough to stop caring about. Which means
we can drop some of the #ifdefs that clutter the HTTP code (and there's
a patch at the end of this mail dropping support for everything older
than 7.11.1). But that made wonder: how old is too old? I think it's
nice that we don't force people to upgrade to the latest version of
curl. But at some point, if you are running a 13-year-old version of
libcurl, how likely are you to run a brand new version of Git? :)

If we declared 7.16.0 as a cutoff, we could unconditionally define
USE_CURL_MULTI, which gets rid of quite a few messy ifdefs.

-Peff

---
 http.c| 51 ---
 http.h| 11 
 remote-curl.c |  3 ---
 3 files changed, 65 deletions(-)

diff --git a/http.c b/http.c
index 8d94e2c63..7a81f0b68 100644
--- a/http.c
+++ b/http.c
@@ -12,19 +12,11 @@
 #include "transport.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
-#if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
-#else
-long int git_curl_ipresolve;
-#endif
 int active_requests;
 int http_is_verbose;
 size_t http_post_buffer = 16 * LARGE_PACKET_MAX;
 
-#if LIBCURL_VERSION_NUM >= 0x070a06
-#define LIBCURL_CAN_HANDLE_AUTH_ANY
-#endif
-
 static int min_curl_sessions = 1;
 static int curl_session_count;
 #ifdef USE_CURL_MULTI
@@ -57,12 +49,8 @@ static struct {
{ "tlsv1.2", CURL_SSLVERSION_TLSv1_2 },
 #endif
 };
-#if LIBCURL_VERSION_NUM >= 0x070903
 static const char *ssl_key;
-#endif
-#if LIBCURL_VERSION_NUM >= 0x070908
 static const char *ssl_capath;
-#endif
 #if LIBCURL_VERSION_NUM >= 0x072c00
 static const char *ssl_pinnedkey;
 #endif
@@ -81,9 +69,7 @@ static struct {
{ "digest", CURLAUTH_DIGEST },
{ "negotiate", CURLAUTH_GSSNEGOTIATE },
{ "ntlm", CURLAUTH_NTLM },
-#ifdef LIBCURL_CAN_HANDLE_AUTH_ANY
{ "anyauth", CURLAUTH_ANY },
-#endif
/*
 * CURLAUTH_DIGEST_IE has no corresponding command-line option in
 * curl(1) and is not included in CURLAUTH_ANY, so we leave it out
@@ -123,7 +109,6 @@ enum http_follow_config http_follow_config = 
HTTP_FOLLOW_INITIAL;
 
 static struct credential cert_auth = CREDENTIAL_INIT;
 static int ssl_cert_password_required;
-#ifdef LIBCURL_CAN_HANDLE_AUTH_ANY
 static unsigned long http_auth_methods = CURLAUTH_ANY;
 static int http_auth_methods_restricted;
 /* Modes for which empty_auth cannot actually help us. */
@@ -133,7 +118,6 @@ static unsigned long empty_auth_useless =
| CURLAUTH_DIGEST_IE
 #endif
| CURLAUTH_DIGEST;
-#endif
 
 static struct curl_slist *pragma_header;
 static struct curl_slist *no_pragma_header;
@@ -207,12 +191,8 @@ static void finish_active_slot(struct active_request_slot 
*slot)
if (slot->results != NULL) {
slot->results->curl_result = slot->curl_result;
slot->results->http_code = slot->http_code;
-#if LIBCURL_VERSION_NUM >= 0x070a08
curl_easy_getinfo(slot->curl, CURLINFO_HTTPAUTH_AVAIL,
  >results->auth_avail);
-#else
-   slot->results->auth_avail = 0;
-#endif
 
curl_easy_getinfo(slot->curl, CURLINFO_HTTP_CONNECTCODE,
>results->http_connectcode);
@@ -272,14 +252,10 @@ static int http_options(const char *var, const char 
*value, void *cb)
return git_config_string(_version, var, value);
if (!strcmp("http.sslcert", var))
return git_config_string(_cert, var, value);
-#if LIBCURL_VERSION_NUM >= 0x070903
if (!strcmp("http.sslkey", var))
return git_config_string(_key, var, value);
-#endif
-#if LIBCURL_VERSION_NUM >= 0x070908
if (!strcmp("http.sslcapath", var))
return git_config_pathname(_capath, var, value);
-#endif
if (!strcmp("http.sslcainfo", var))
return git_config_pathname(_cainfo, var, value);
if (!strcmp("http.sslcertpasswordprotected", var)) {
@@ -398,12 +374,6 @@ static int curl_empty_auth_enabled(void)
if (curl_empty_auth >= 0)
return curl_empty_auth;
 
-#ifndef LIBCURL_CAN_HANDLE_AUTH_ANY
-   /*
-* Our libcurl is too old to do AUTH_ANY in the first place;
-* just default to turning the feature off.
-*/
-#else
/*
 * In the automatic case, kick in the empty-auth
 * hack as long as we would potentially try some
@@ -416,7 +386,6 @@ static int curl_empty_auth_enabled(void)
if (http_auth_methods_restricted &&
(http_auth_methods & ~empty_auth_useless))
return 1;
-#endif
return 0;
 }
 
@@ -487,7 +456,6 @@ static void init_curl_proxy_auth(CURL *result)
 
var_override(_proxy_authmethod, 

Re: [PATCH v5] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread Jeff King
On Mon, Apr 03, 2017 at 06:32:10PM -0700, Jonathan Nieder wrote:

> David Turner wrote:
> 
> > This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
> > buffer size.
> 
> Neat.
> 
> For completeness, it's useful to know this was added in curl 7.11.1,
> which is old enough for us to be able to count on users having it (in
> fact it was released >10 years ago).

We have a number of LIBCURL_VERSION_NUM checks that are older than that.
I'm totally OK with declaring a 13-year old version of curl as obsolete,
but we should probably consider dropping some of those ancient ifdefs. I
wouldn't be surprised if some of them are buggy or even fail to compile
at this point.

-Peff


Re: [PATCH v4 2/4] fsck: force core.checksumindex=1

2017-04-03 Thread Jeff King
On Mon, Apr 03, 2017 at 10:31:03PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Apr 3, 2017 at 8:53 PM,   wrote:
> > Teach fsck to override core.checksumindex and always verify
> > the index checksum when reading the index.
> 
> Sorry to only chime in about this at v4.
> 
> I think this patch & the documentation you added for
> core.checksumindex in 1/4 would be much less confusing if you made
> this a on-by-default command-line option like e.g. "full".
> 
> With this patch nothing amends the documentation to indicate that the
> core.checksumindex is magically overridden when fsck runs, I think
> something like this (needs amending to integrate) on top would make
> this clearer:

I think that is the wrong direction to reduce confusion. We don't need
more options to twiddle this flag, we need fewer. For instance, in your
proposed documentation:

> @@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
> namespace, and all reflogs
> object pools.  This is now default; you can turn it off
> with --no-full.
> 
> +--[no-]checksum-index:
> +   Validate the checksum at the end of the index file, on by
> +   default, locally overrides any "core.checksumIndex" setting
> +   unless negated. See linkgit:git-config[1].

That tells us _what_ it does, but I'm left scratching my head with
"why".

I don't think there is any reason you would want fsck not to compute
that checksum (just like there is no option to ask fsck not to check
pack sha1 trailers).

I would go so far as to say that the config option itself is unnecessary
in this iteration of the series. I only asked for it so that we could
test the verification code paths (both for correctness and performance).
But if fsck can exercise the code path, we can check correctness that
way. And for performance, it's probably enough to test two separate
builds (which Jeff has already done).

Junio also asked for the usual "add a config, and then later we'll flip
the default" procedure. But IMHO that isn't necessary here. Nobody
should ever care about this flag. It was largely useless to check it on
every read in the first place. And if you suspect there's corruption in
your repository, you should run "git fsck".

-Peff


Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-03 Thread Jeff King
On Mon, Apr 03, 2017 at 11:57:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> +These features can be enabled on git servers, they won't help the
> >> +performance of the servers themselves,
> >
> > Is that true for bitmaps?  I thought they reduced CPU usage on
> > the server side...
> 
> I'm not sure, JK? From my reading of the repack.writeBitmaps docs it
> seems to only help clone/fetch for the client, but maybe they do more
> than that.

Bitmaps reduce the CPU required to do the "Counting" phase of
pack-objects. For serving a fetch or clone, the server side is happy
because they use less CPU, but the client is happy because the server
moves to the "Writing" phase more quickly.

Bitmaps also help with pushes, but this is usually less interesting. You
don't tend to push all of history over and over (whereas people _do_
tend to clone all of history over and over).

They don't speed up the counting portion of a regular repack. In theory
they could, but the resulting packs may grow less optimal over time
(e.g., we can't compute the same history-based write order, so over time
your objects would get jumbled, leading to worse cold-cache behavior).

You can also use bitmaps for other reachability computations, but we
don't do so currently. I have patches that I need to clean up to use
them for "git prune", doing ahead/behind checks, --contains, etc.

> I also see we should mention pack.writeBitmapHashCache, which
> according to my reading of v2.0.0-rc0~13^2~8 only helps clone/fetch.

Yes, it helps the delta search heuristic, so only pack-objects would
ever benefit. This should basically be turned on all the time, as
without it fetches from partially-bitmapped repos (i.e., when you've
gotten some pushes but haven't repacked yet) do a really bad job of
finding deltas (the waste too much time and deliver sub-optimal packs).

Arguably it should be the default. The initial patches made it optional
for strict JGit compatibility (I don't know if JGit ever implemented the
extension). We've had it on at GitHub since day one, so I don't have any
operational experience with turning it off (aside from the simulated
numbers in that commit message).

> > A sidenote: I wonder if bitmaps should be the default for bare
> > repos, since bare repos are likely used on servers.

That's an interesting notion. It's a net loss if you don't serve a lot
of fetches, because it's trying to amortize the extra CPU during the
repack with faster fetches and clones. So it makes sense for a hosting
site, but less for somebody pushing to a personal bare repo.

-Peff


Re: Bug? git submodule update --reference doesn't use the referenced repository

2017-04-03 Thread Maxime Viargues

On 04-Apr-17 4:32 AM, Stefan Beller wrote:

On Sun, Apr 2, 2017 at 8:13 PM, Maxime Viargues
 wrote:

Hi there,

I have been trying to use the --reference option to clone a big repository
using a local copy, but I can't manage to make it work using sub-module
update. I believe this is a bug, unless I missed something.
I am on Windows, Git 2.12.0

which is new enough, that the new --reference code is in. :)


So the problem is as follow:
- I have got a repository with multiple sub-modules, say
 main
 lib1
 sub-module1.git
 lib2
 sub-module2.git
- The original repositories are in GitHub, which makes it slow
- I have done a normal git clone of the entire repository (not bare) and put
it on a file server, say \\fileserver\ref_repo\
(Note that the problem also happens with local copy)

So if I do a clone to get the repo and all the submodules with...
git clone --reference-if-able \\fileserver\ref-repo --recursive
g...@github.com:company/main
...then it all works, all the sub-modules get cloned and the it's fast.

great. :)


Now in my case I am working with Jenkins jobs and I need to first do a
clone, and then get the sub-modules, but if I do...
git clone --reference-if-able \\fileserver\ref-repo
g...@github.com:company/main (so non-recursive)
cd main
git submodule update --init --reference \\fileserver\ref-repo
... then this takes ages, as it would normally do without the use of
--reference. I suspect it's not actually using it.

So to confirm your suspicion, can you run

   GIT_TRACE=1 git clone ...
   cd main && GIT_TRACE=1 git submodule update ...

to see which child processes are spawned to deal with the submodules?
Also to confirm, it is the "submodule update" that is taking so long for you?
Yes I confirm it's the "submodule update" which is taking a long time. 
The clone with the reference is definitely working.


Running git submodule update with  "GIT_TRACE=1", here is a snippet of 
what I get:


10:14:44.924684 git.c:596   trace: exec: 'git-submodule' 
'update' '--init' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo'
10:14:44.925684 run-command.c:369   trace: run_command: 
'git-submodule' 'update' '--init' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo'
10:14:45.146488 git.c:596   trace: exec: 
'git-sh-i18n--envsubst' '--variables' 'usage: $dashless $USAGE'
10:14:45.146488 run-command.c:369   trace: run_command: 
'git-sh-i18n--envsubst' '--variables' 'usage: $dashless $USAGE'
10:14:45.231548 git.c:596   trace: exec: 
'git-sh-i18n--envsubst' 'usage: $dashless $USAGE'
10:14:45.231548 run-command.c:369   trace: run_command: 
'git-sh-i18n--envsubst' 'usage: $dashless $USAGE'
10:14:45.357059 git.c:371   trace: built-in: git 'rev-parse' 
'--git-dir'
10:14:45.427806 git.c:371   trace: built-in: git 'rev-parse' 
'--git-path' 'objects'
10:14:45.487348 git.c:371   trace: built-in: git 'rev-parse' 
'-q' '--git-dir'
10:14:45.593794 git.c:371   trace: built-in: git 'rev-parse' 
'--show-prefix'
10:14:45.643162 git.c:371   trace: built-in: git 'rev-parse' 
'--show-toplevel'
10:14:45.700201 git.c:371   trace: built-in: git 
'submodule--helper' 'init'
10:14:45.986024 git.c:371   trace: built-in: git 
'submodule--helper' 'update-clone' 
'--reference=\\fileserver\Builds\reference_repos\main-repo'
10:14:45.988024 run-command.c:1155  run_processes_parallel: 
preparing to run up to 1 tasks
10:14:45.988024 run-command.c:369   trace: run_command: 
'submodule--helper' 'clone' '--path' 'lib1/lib1_source' '--name' 
'lib1/lib1_source' '--url' 'g...@github.com:company/main-repo-lib1.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:15:06.204872 run-command.c:369   trace: run_command: 
'submodule--helper' 'clone' '--path' 'lib2/lib2_source' '--name' 
'lib2/lib2_source' '--url' 'g...@github.com:company/main-repo-lib2.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:14:46.02 git.c:371   trace: built-in: git 
'submodule--helper' 'clone' '--path' 'lib1/lib1_source' '--name' 
'lib1/lib1_source' '--url' 'g...@github.com:company/main-repo-lib1.git' 
'--reference' '\\fileserver\Builds\reference_repos\main-repo'
10:14:46.027555 run-command.c:369   trace: run_command: 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib1/lib1_source' 
'g...@github.com:company/main-repo-lib1.git' 
'D:/tmp2/git_clone_tests/main-repo/lib1/lib1_source'
10:14:46.061305 git.c:371   trace: built-in: git 'clone' 
'--no-checkout' '--reference' 
'\\fileserver\Builds\reference_repos\main-repo' '--separate-git-dir' 
'D:/tmp2/git_clone_tests/main-repo/.git/modules/lib1/lib1_source' 
'g...@github.com:company/main-repo-lib1.git' 

Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread Jeff King
On Mon, Apr 03, 2017 at 05:03:49PM +, David Turner wrote:

> > > Unfortunately, in order to push some large repos, the http postbuffer
> > > must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> > > we just malloc a larger buffer.
> > 
> > I'm still not sure why a 2GB post-buffer is necessary. It sounds like 
> > something
> > is broken in your setup. Large pushes should be sent chunked.
> > 
> > I know broken setups are a fact of life, but this feels like a really hacky 
> > work-
> > around.
> 
> I'm not sure what other workaround I should use.  I guess I could do
> multiple pushes, but only if individual objects are under the size
> limit, and I'm not sure all of mine are (maybe I'll get lucky tho).  I
> know that this is a configuration issue with gitlab:
> https://gitlab.com/gitlab-org/gitlab-ce/issues/30315 but I don't know
> when that will get fixed.  I could manually copy the repo to the
> server and do a local push, but I don't know that I have the necessary
> permissions to do that. Or I could do this, which would hopefully
> actually solve the problem.

I didn't think we had gotten details on what was actually broken. Is it
really that GitLab does not support chunked transfers? I know that's
what the issue above says, but it sounds like it is just assuming that
is the problem based on the recent messages to the list.

If that's really the issue, then OK. That's lame, but something the
client has to work around. It seems like a pretty big gap, though (and
one I'd think people would have hit before; the default post-buffer is
only 1MB. Surely people routinely push more than that to GitLab servers?
So I'm really wondering if there is something else going on here.

What does it look like when it fails? What does GIT_TRACE_CURL look like
(or GIT_CURL_VERBOSE if your client is older, but remember to sanitize
any auth lines)?

> > The ultimate fate of this number, though, is to be handed to:
> > 
> >   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
> > 
> > where the final argument is interpreted as a long. So I suspect that on 
> > 64-bit
> > Windows, setting http.postbuffer to "3G" would cause some kind of weird
> > error (either a truncated post or some internal curl error due to the 
> > negative
> > size, depending on how curl handles it).
> 
> Ah, so we would need to use CURLOPT_POSTFIELDSIZE_LARGE.  Will re-roll.

Ah, neat. I didn't even know about CURLOPT_POSTFIELDSIZE_LARGE, and
thought we'd have to just limit 32-bit platforms. That's a much better
solution.

> > I saw the earlier iteration used a size_t, but you switched it after the 
> > compiler
> > (rightfully) complained about the signedness. But I'm not sure why we would
> > want ssize_t here instead of just using git_parse_unsigned().
> 
> It was originally signed.  I'm not sure why that was, but I figured it
> would be simpler to save the extra bit just in case.

I think it was simply because git_config_int() is the generic "number"
parser, and nobody ever thought about it.

In fact, passing a negative value to curl would be disastrous, as it
would use strlen(). I don't think a negative value could ever get that
far, though. It looks like the config code would silently turn a
negative value into LARGE_PACKET_MAX.

IMHO, complaining about the negative number to the user would be an
improvement.

-Peff


Re: [PATCH v5] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread Jonathan Nieder
David Turner wrote:

> This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
> buffer size.

Neat.

For completeness, it's useful to know this was added in curl 7.11.1,
which is old enough for us to be able to count on users having it (in
fact it was released >10 years ago).

[...]
> +++ b/remote-curl.c
> @@ -531,6 +531,12 @@ static int probe_rpc(struct rpc_state *rpc, struct 
> slot_results *results)
>   return err;
>  }
>  
> +static curl_off_t xcurl_off_t(ssize_t len) {
> + if (len > (curl_off_t) len)
> + die("Cannot handle pushes this big");

nit: other calls to die() here and elsewhere tend to use a lowercase
error message.

More importantly, converting a value to a signed type when the value
cannot be represented in it yields implementation-defined behavior
(C99 section 6.3.1.3 "signed and unsigned integers").  That makes it
fodder for over-eager optimizers.

Would something like the following work?

With that change,
Reviewed-by: Jonathan Nieder 

diff --git i/remote-curl.c w/remote-curl.c
index b7b69e096a..cf171b1bc9 100644
--- i/remote-curl.c
+++ w/remote-curl.c
@@ -532,8 +532,8 @@ static int probe_rpc(struct rpc_state *rpc, struct 
slot_results *results)
 }
 
 static curl_off_t xcurl_off_t(ssize_t len) {
-   if (len > (curl_off_t) len)
-   die("Cannot handle pushes this big");
+   if (len > maximum_signed_value_of_type(curl_off_t))
+   die("cannot handle pushes this big");
return (curl_off_t) len;
 }
 


[PATCH v5] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread David Turner
Unfortunately, in order to push some large repos, the http postbuffer
must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
we just malloc a larger buffer.

This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
buffer size.

Signed-off-by: David Turner 
---

V5 addresses comments from Torsten Boegershausen and Ramsay Jones.  Since
I don't have a 32-bit machine handy, it's difficult for me to check
for compiler warnings on 32-bit machines.  Hopefully my guess as
to the solution to Ramsay's issue will be correct.

cache.h   |  1 +
 config.c  | 17 +
 http.c|  4 ++--
 http.h|  2 +-
 remote-curl.c | 12 +---
 5 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index fbdf7a815a..5e6747dbb4 100644
--- a/cache.h
+++ b/cache.h
@@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
 extern int git_config_int(const char *, const char *);
 extern int64_t git_config_int64(const char *, const char *);
 extern unsigned long git_config_ulong(const char *, const char *);
+extern ssize_t git_config_ssize_t(const char *, const char *);
 extern int git_config_bool_or_int(const char *, const char *, int *);
 extern int git_config_bool(const char *, const char *);
 extern int git_config_maybe_bool(const char *, const char *);
diff --git a/config.c b/config.c
index 1a4d85537b..aae6dcc34e 100644
--- a/config.c
+++ b/config.c
@@ -834,6 +834,15 @@ int git_parse_ulong(const char *value, unsigned long *ret)
return 1;
 }
 
+static int git_parse_ssize_t(const char *value, ssize_t *ret)
+{
+   intmax_t tmp;
+   if (!git_parse_signed(value, , 
maximum_signed_value_of_type(ssize_t)))
+   return 0;
+   *ret = tmp;
+   return 1;
+}
+
 NORETURN
 static void die_bad_number(const char *name, const char *value)
 {
@@ -892,6 +901,14 @@ unsigned long git_config_ulong(const char *name, const 
char *value)
return ret;
 }
 
+ssize_t git_config_ssize_t(const char *name, const char *value)
+{
+   ssize_t ret;
+   if (!git_parse_ssize_t(value, ))
+   die_bad_number(name, value);
+   return ret;
+}
+
 int git_parse_maybe_bool(const char *value)
 {
if (!value)
diff --git a/http.c b/http.c
index 96d84bbed3..22f8167ba2 100644
--- a/http.c
+++ b/http.c
@@ -19,7 +19,7 @@ long int git_curl_ipresolve;
 #endif
 int active_requests;
 int http_is_verbose;
-size_t http_post_buffer = 16 * LARGE_PACKET_MAX;
+ssize_t http_post_buffer = 16 * LARGE_PACKET_MAX;
 
 #if LIBCURL_VERSION_NUM >= 0x070a06
 #define LIBCURL_CAN_HANDLE_AUTH_ANY
@@ -331,7 +331,7 @@ static int http_options(const char *var, const char *value, 
void *cb)
}
 
if (!strcmp("http.postbuffer", var)) {
-   http_post_buffer = git_config_int(var, value);
+   http_post_buffer = git_config_ssize_t(var, value);
if (http_post_buffer < LARGE_PACKET_MAX)
http_post_buffer = LARGE_PACKET_MAX;
return 0;
diff --git a/http.h b/http.h
index 02bccb7b0c..f7bd3b26b0 100644
--- a/http.h
+++ b/http.h
@@ -111,7 +111,7 @@ extern struct curl_slist *http_copy_default_headers(void);
 extern long int git_curl_ipresolve;
 extern int active_requests;
 extern int http_is_verbose;
-extern size_t http_post_buffer;
+extern ssize_t http_post_buffer;
 extern struct credential http_auth;
 
 extern char curl_errorstr[CURL_ERROR_SIZE];
diff --git a/remote-curl.c b/remote-curl.c
index e953d06f66..b7b69e096a 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -531,6 +531,12 @@ static int probe_rpc(struct rpc_state *rpc, struct 
slot_results *results)
return err;
 }
 
+static curl_off_t xcurl_off_t(ssize_t len) {
+   if (len > (curl_off_t) len)
+   die("Cannot handle pushes this big");
+   return (curl_off_t) len;
+}
+
 static int post_rpc(struct rpc_state *rpc)
 {
struct active_request_slot *slot;
@@ -614,7 +620,7 @@ static int post_rpc(struct rpc_state *rpc)
 * and we just need to send it.
 */
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
-   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, gzip_size);
+   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, 
xcurl_off_t(gzip_size));
 
} else if (use_gzip && 1024 < rpc->len) {
/* The client backend isn't giving us compressed data so
@@ -645,7 +651,7 @@ static int post_rpc(struct rpc_state *rpc)
 
headers = curl_slist_append(headers, "Content-Encoding: gzip");
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
-   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, gzip_size);
+   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, 
xcurl_off_t(gzip_size));
 
if (options.verbosity > 1) {
fprintf(stderr, "POST %s (gzip 

RE: [PATCH v4] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread David Turner


> -Original Message-
> From: Ramsay Jones [mailto:ram...@ramsayjones.plus.com]
> Sent: Monday, April 3, 2017 6:24 PM
> To: David Turner ; git@vger.kernel.org
> Subject: Re: [PATCH v4] http.postbuffer: allow full range of ssize_t values
> 
> 
> 
> On 03/04/17 18:30, David Turner wrote:
> > Unfortunately, in order to push some large repos, the http postbuffer
> > must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> > we just malloc a larger buffer.
> >
> > This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set
> the
> > buffer size.
> >
> > Signed-off-by: David Turner 
> > ---
> >  cache.h   |  1 +
> >  config.c  | 17 +
> >  http.c|  4 ++--
> >  http.h|  2 +-
> >  remote-curl.c |  6 +++---
> >  5 files changed, 24 insertions(+), 6 deletions(-)
> >
> > diff --git a/cache.h b/cache.h
> > index fbdf7a815a..5e6747dbb4 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
> > extern int git_config_int(const char *, const char *);  extern int64_t
> > git_config_int64(const char *, const char *);  extern unsigned long
> > git_config_ulong(const char *, const char *);
> > +extern ssize_t git_config_ssize_t(const char *, const char *);
> >  extern int git_config_bool_or_int(const char *, const char *, int *);
> > extern int git_config_bool(const char *, const char *);  extern int
> > git_config_maybe_bool(const char *, const char *); diff --git
> > a/config.c b/config.c index 1a4d85537b..de5b155a4e 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -834,6 +834,15 @@ int git_parse_ulong(const char *value, unsigned
> long *ret)
> > return 1;
> >  }
> >
> > +static int git_parse_ssize_t(const char *value, ssize_t *ret) {
> > +   ssize_t tmp;
> > +   if (!git_parse_signed(value, ,
> maximum_signed_value_of_type(ssize_t)))
> > +   return 0;
> > +   *ret = tmp;
> > +   return 1;
> > +}
> > +
> 
> The previous version of this patch causes gcc to complain on a Linux 32bit
> build, like so:
> 
> CC config.o
> config.c: In function ‘git_parse_ssize_t’:
> config.c:840:31: warning: passing argument 2 of ‘git_parse_signed’ from
> incompatible pointer type [-Wincompatible-pointer-types]
>   if (!git_parse_signed(value, ,
> maximum_signed_value_of_type(ssize_t)))
>^
> config.c:753:12: note: expected ‘intmax_t * {aka long long int *}’ but
> argument is of type ‘ssize_t * {aka int *}’
>  static int git_parse_signed(const char *value, intmax_t *ret, intmax_t max)
> ^
> 
> ... and I don't think this version would be any different (though I haven't
> tried).

So I guess that means tmp needs to be an intmax_t.


Re: [PATCH v2 4/8] Specify explicitly where we parse timestamps

2017-04-03 Thread Johannes Schindelin
Hi Torsten,

On Mon, 3 Apr 2017, Torsten Bögershausen wrote:

> []
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -319,6 +319,8 @@ extern char *gitdirname(char *);
> >  #define PRIo32 "o"
> >  #endif
> >
> > +#define parse_timestamp strtoul
> > +
> 
> Would
> #define parse_timestamp(a,b,c)  strtoul((a),(b),(c))
> be more strict ?

It would be more pedantic. But I fail to see how it would be more strict:
once the preprocessor substituted `parse_timestamp` for whatever we
#define'd it to, the C compiler will then validate the calls against the
signature of the real function. That validation will be much more strict
than merely testing for the correct number of parameters.

Ciao,
Johannes

Re: [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance

2017-04-03 Thread Stefan Beller
On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira  wrote:
> Create inline helpers to dir_iterator_advance(). Make
> dir_iterator_advance()'s code more legible and allow some behavior to
> be reusable.
>
> Signed-off-by: Daniel Ferreira 
> ---


> +static inline void push_dir_level(struct dir_iterator_int *iter, struct 
> dir_iterator_level *level)

Sometimes we use inline, sometimes we do not:
$ git grep "static inline" -- *.c |wc -l
84
$ git grep "static inline" -- *.h |wc -l
130
# approximate a function definition:
$ git grep -e "static" -- *.c  |grep "("|wc -l
2971

So I think we'd want to have the inline keyword in
the header only (when the code is actually to be inlined).
As a C file is one translation unit, the compiler can figure
out best whether to inline a particular static function on
its own.

The rest looks good to me,

Thanks,
Stefan


Re: [PATCH v2 1/8] ref-filter: avoid using `unsigned long` for catch-all data type

2017-04-03 Thread Johannes Schindelin
Hi Torsten,

On Mon, 3 Apr 2017, Torsten Bögershausen wrote:

> On 02/04/17 21:06, Johannes Schindelin wrote:
> > In its `atom_value` struct, the ref-filter source code wants to store
> > different values in a field called `ul` (for `unsigned long`), e.g.
> > timestamps.
> >
> > However, as we are about to switch the data type of timestamps away from
> > `unsigned long` (because it may be 32-bit even when `time_t` is 64-bit),
> > that data type is not large enough.
> >
> > Simply change that field to use `uintmax_t` instead.
> >
> > This patch is a bit larger than the mere change of the data type
> > because the field's name was tied to its data type, which has been fixed
> > at the same time.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  ref-filter.c | 16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 9c82b5b9d63..8538328fc7f 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -351,7 +351,7 @@ struct ref_formatting_state {
> >  struct atom_value {
> >   const char *s;
> >   void (*handler)(struct atom_value *atomv, struct ref_formatting_state
> >   *state);
> > -   unsigned long ul; /* used for sorting when not FIELD_STR */
> > +   uintmax_t value; /* used for sorting when not FIELD_STR */
> > struct used_atom *atom;
> >  };
> >
> > @@ -723,7 +723,7 @@ static void grab_common_values(struct atom_value *val,
> > int deref, struct object
> >if (!strcmp(name, "objecttype"))
> > v->s = typename(obj->type);
> > else if (!strcmp(name, "objectsize")) {
> > -   v->ul = sz;
> > +   v->value = sz;
> > v->s = xstrfmt("%lu", sz);
> >}
> >else if (deref)
> > @@ -770,8 +770,8 @@ static void grab_commit_values(struct atom_value *val,
> > int deref, struct object
> > v->s = xstrdup(oid_to_hex(>tree->object.oid));
> >}
> >else if (!strcmp(name, "numparent")) {
> > -   v->ul = commit_list_count(commit->parents);
> > -   v->s = xstrfmt("%lu", v->ul);
> > +   v->value = commit_list_count(commit->parents);
> > +   v->s = xstrfmt("%lu", (unsigned long)v->value);
> 
> If we want to get rid of "%lu" at some day, we can do like this:
> v->s = xstrfmt("%" PRIuMAX, v->value);
> Or, to make clear that under all circumstances an unsigned long is big enough
> to
> hold the counter, for readers in the future, use something like this:
>   v->s = xstrfmt("%lu", (xulong_t)v->value);

We could do that, yes.

But part of my patch series is to clarify in a semantic way what the
purpose of the code is. Your solution would keep it syntactically correct,
of course, but it would also make it semantically unclear again.

By writing "%"PRIutime *instead of* "%"PRIuMAX, we are saying: look, we
are talking about a timestamp here. That would not at all be clear if we
wrote "%"PRIuMAX.

Ciao,
Johannes

Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-03 Thread Eric Wong
Ævar Arnfjörð Bjarmason  wrote:
> On Mon, Apr 3, 2017 at 11:34 PM, Eric Wong  wrote:
> > Ævar Arnfjörð Bjarmason  wrote:
> >>  - Should we be covering good practices for your repo going forward to
> >>maintain good performance? E.g. don't have some huge tree all in
> >>one directory (use subdirs), don't add binary (rather
> >>un-delta-able) content if you can help it etc.
> >
> > Yes, I think so.
> 
> I'll try to write something up.
> 
> > I think avoiding ever growing ChangeLog-type files should also
> > be added to things to avoid.
> 
> How were those bad specifically? They should delta quite well, it's
> expensive to commit large files but no more because they're
> ever-growing.

It might be blame/annotate specifically, I was remembering this
thread from a decade ago:

  
https://public-inbox.org/git/4aca3dc20712110933i636342fbifb15171d3e3ca...@mail.gmail.com/T/

> One issue with e.g. storing logs (I keep my IRC logs in git) is that
> if you're constantly committing large (text) files without repack your
> .git grows by a *lot* in a very short amount of time until a very
> expensive repack, so now I split my IRC logs by month.

Yep, that too; as auto GC is triggered by the number of loose
objects, not the size/packability of them.


Re: [PATCH v7 2/5] remove_subtree(): test removing nested directories

2017-04-03 Thread Stefan Beller
On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira  wrote:
> Test removing a nested directory when an attempt is made to restore the
> index to a state where it does not exist. A similar test could be found
> previously in t/t2000-checkout-cache-clash.sh, but it did not check for
> nested directories, which could allow a faulty implementation of
> remove_subtree() pass the tests.
>
> Signed-off-by: Daniel Ferreira 
> ---
>  t/t2000-checkout-cache-clash.sh | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
> index de3edb5..ac10ba3 100755
> --- a/t/t2000-checkout-cache-clash.sh
> +++ b/t/t2000-checkout-cache-clash.sh
> @@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with 
> --prefix' '
> git checkout-index -a -f --prefix=there/
>  '
>
> +test_expect_success 'git checkout-index -f should remove nested subtrees' '
> +   echo content >path &&
> +   git update-index --add path &&
> +   rm path &&
> +   mkdir -p path/with/nested/paths &&
> +   echo content >path/file1 &&
> +   echo content >path/with/nested/paths/file2 &&
> +   git checkout-index -f -a &&
> +   test ! -d path
> +'
> +

Looks good to me.

Thanks,
Stefan


Re: [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API

2017-04-03 Thread Stefan Beller
On Sun, Apr 2, 2017 at 1:03 PM, Daniel Ferreira  wrote:
> Create t/helper/test-dir-iterator.c, which prints relevant information
> about a directory tree iterated over with dir_iterator.
>
> Create t/t0065-dir-iterator.sh, which tests that dir_iterator does
> iterate through a whole directory tree.
>
> Signed-off-by: Daniel Ferreira 
> ---
>  Makefile |  1 +
>  t/helper/.gitignore  |  1 +
>  t/helper/test-dir-iterator.c | 28 +++
>  t/t0065-dir-iterator.sh  | 54 
> 
>  4 files changed, 84 insertions(+)
>  create mode 100644 t/helper/test-dir-iterator.c
>  create mode 100755 t/t0065-dir-iterator.sh
>
> diff --git a/Makefile b/Makefile
> index 9b36068..41ce9ab 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -614,6 +614,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
>  TEST_PROGRAMS_NEED_X += test-config
>  TEST_PROGRAMS_NEED_X += test-date
>  TEST_PROGRAMS_NEED_X += test-delta
> +TEST_PROGRAMS_NEED_X += test-dir-iterator
>  TEST_PROGRAMS_NEED_X += test-dump-cache-tree
>  TEST_PROGRAMS_NEED_X += test-dump-split-index
>  TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
> diff --git a/t/helper/.gitignore b/t/helper/.gitignore
> index 758ed2e..a7d74d3 100644
> --- a/t/helper/.gitignore
> +++ b/t/helper/.gitignore
> @@ -3,6 +3,7 @@
>  /test-config
>  /test-date
>  /test-delta
> +/test-dir-iterator
>  /test-dump-cache-tree
>  /test-dump-split-index
>  /test-dump-untracked-cache
> diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
> new file mode 100644
> index 000..06f03fc
> --- /dev/null
> +++ b/t/helper/test-dir-iterator.c
> @@ -0,0 +1,28 @@
> +#include "git-compat-util.h"
> +#include "strbuf.h"
> +#include "iterator.h"
> +#include "dir-iterator.h"
> +
> +int cmd_main(int argc, const char **argv) {
> +   struct strbuf path = STRBUF_INIT;
> +   struct dir_iterator *diter;
> +
> +   if (argc < 2) {

Coding style: please use no braces for single statements after
control structures (if, for, while).

Instead of just returning 1 (which is harder to diagnose in an
interactive shell for users who do not know the details of this
program), we could

die("BUG: test-dir-iterator takes exactly one argument");

> +   return 1;
> +   }
> +
> +   strbuf_add(, argv[1], strlen(argv[1]));
> +
> +   diter = dir_iterator_begin(path.buf);
> +
> +   while (dir_iterator_advance(diter) == ITER_OK) {
> +   if (S_ISDIR(diter->st.st_mode))
> +   printf("[d] ");
> +   else
> +   printf("[f] ");

In the operating system there are more 'st_mode's than
just directory or file, e.g. symbolic link.

Maybe

else if (S_ISREG(diter->st.st_mode))
printf("[f] ");
else
printf("[?]");

to catch the unknown things better instead of
defaulting them to regular files?

> +#!/bin/sh
> +
> +test_description='Test directory iteration.'
> +
> +. ./test-lib.sh
> +
> +cat >expect-sorted-output <<-\EOF &&
> +[d] (a) ./dir/a
> +[d] (a/b) ./dir/a/b
> +[d] (a/b/c) ./dir/a/b/c
> +[d] (d) ./dir/d
> +[d] (d/e) ./dir/d/e
> +[d] (d/e/d) ./dir/d/e/d
> +[f] (a/b/c/d) ./dir/a/b/c/d
> +[f] (a/e) ./dir/a/e
> +[f] (b) ./dir/b
> +[f] (c) ./dir/c
> +[f] (d/e/d/a) ./dir/d/e/d/a
> +EOF
> +
> +test_expect_success 'dir-iterator should iterate through all files' '
> +   mkdir -p dir &&
> +   mkdir -p dir/a/b/c/ &&
> +   >dir/b &&
> +   >dir/c &&
> +   mkdir -p dir/d/e/d/ &&
> +   >dir/a/b/c/d &&
> +   >dir/a/e &&
> +   >dir/d/e/d/a &&
> +
> +   test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output &&

We just had some micro projects that were trying to get rid of git
commands on the upstream of a pipe, to better observe
the exit codes of the programs that we write here. I think it is
not as critical for test helpers, but the it would be better to diagnose
an error when we had

test-dir-iterator ./dir >out &&
sort actual &&
test_cmp expect actual

as in case of an error we'd stop early in the && chain at test-dir-iterator
(and if that would use the die() function we'd even get some error message
on stderr).

> +   rm -rf dir &&

I'd put that cleanup first via
test_expect_success 'title' '
test_when_finished "rm -rf dir" &&
mkdir ... &&
...
'

By doing so, an inspection is easier afterwards.
When this test breaks, you'd want to debug it via e.g.

  cd t
  ./t0065-dir-iterator.sh -d -i -v -x # see README for all the nice flags)

which in case of error would stop at e.g. test_cmp failing. But at that
point of the test we already removed the files so it is impossible to inspect
what was on the file system, without adding a test_pause before the rm.

> +
> +   test_cmp expect-sorted-output actual-pre-order-sorted-output

Usually the file names are shorter (expect, actual), but I do not
mind encoding some expectation in them.

Thanks,

Re: [PATCH v4] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread Ramsay Jones


On 03/04/17 18:30, David Turner wrote:
> Unfortunately, in order to push some large repos, the http postbuffer
> must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> we just malloc a larger buffer.
> 
> This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
> buffer size.
> 
> Signed-off-by: David Turner 
> ---
>  cache.h   |  1 +
>  config.c  | 17 +
>  http.c|  4 ++--
>  http.h|  2 +-
>  remote-curl.c |  6 +++---
>  5 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index fbdf7a815a..5e6747dbb4 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
>  extern int git_config_int(const char *, const char *);
>  extern int64_t git_config_int64(const char *, const char *);
>  extern unsigned long git_config_ulong(const char *, const char *);
> +extern ssize_t git_config_ssize_t(const char *, const char *);
>  extern int git_config_bool_or_int(const char *, const char *, int *);
>  extern int git_config_bool(const char *, const char *);
>  extern int git_config_maybe_bool(const char *, const char *);
> diff --git a/config.c b/config.c
> index 1a4d85537b..de5b155a4e 100644
> --- a/config.c
> +++ b/config.c
> @@ -834,6 +834,15 @@ int git_parse_ulong(const char *value, unsigned long 
> *ret)
>   return 1;
>  }
>  
> +static int git_parse_ssize_t(const char *value, ssize_t *ret)
> +{
> + ssize_t tmp;
> + if (!git_parse_signed(value, , 
> maximum_signed_value_of_type(ssize_t)))
> + return 0;
> + *ret = tmp;
> + return 1;
> +}
> +

The previous version of this patch causes gcc to complain on
a Linux 32bit build, like so:

CC config.o
config.c: In function ‘git_parse_ssize_t’:
config.c:840:31: warning: passing argument 2 of ‘git_parse_signed’ from 
incompatible pointer type [-Wincompatible-pointer-types]
  if (!git_parse_signed(value, , maximum_signed_value_of_type(ssize_t)))
   ^
config.c:753:12: note: expected ‘intmax_t * {aka long long int *}’ but argument 
is of type ‘ssize_t * {aka int *}’
 static int git_parse_signed(const char *value, intmax_t *ret, intmax_t max)
^

... and I don't think this version would be any different (though I
haven't tried).

ATB,
Ramsay Jones



Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-03 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 3, 2017 at 11:34 PM, Eric Wong  wrote:
> Ævar Arnfjörð Bjarmason  wrote:
>> Add a new manpage that gives an overview of how to tweak git's
>> performance.
>>
>> There's currently no good single resource for things a git site
>> administrator might want to look into to improve performance for his
>> site & his users. This unfinished documentation aims to be the first
>> thing someone might want to look at when investigating ways to improve
>> git performance.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>
>> I've been wanting to get something like this started for a while. It's
>> obviously woefully incomplete. Pointers about what to include would be
>> great & whether including something like this makes sense.
>
> Thanks for doing this.  I hope something like this can give
> server operators more confidence to host their own git servers.
>
>> Things I have on my TODO list:
>
> 
>
>>  - Should we be covering good practices for your repo going forward to
>>maintain good performance? E.g. don't have some huge tree all in
>>one directory (use subdirs), don't add binary (rather
>>un-delta-able) content if you can help it etc.
>
> Yes, I think so.

I'll try to write something up.

> I think avoiding ever growing ChangeLog-type files should also
> be added to things to avoid.

How were those bad specifically? They should delta quite well, it's
expensive to commit large files but no more because they're
ever-growing.

One issue with e.g. storing logs (I keep my IRC logs in git) is that
if you're constantly committing large (text) files without repack your
.git grows by a *lot* in a very short amount of time until a very
expensive repack, so now I split my IRC logs by month.

But I'm probably forgetting some obvious case where the ChangeLog
use-case is bad.

>> --- /dev/null
>> +++ b/Documentation/gitperformance.txt
>> @@ -0,0 +1,107 @@
>> +giteveryday(7)
>
> gitperformance(7)

Oops, thanks.

>> +Server options to help clients
>> +~~
>> +
>> +These features can be enabled on git servers, they won't help the
>> +performance of the servers themselves,
>
> Is that true for bitmaps?  I thought they reduced CPU usage on
> the server side...

I'm not sure, JK? From my reading of the repack.writeBitmaps docs it
seems to only help clone/fetch for the client, but maybe they do more
than that.

I also see we should mention pack.writeBitmapHashCache, which
according to my reading of v2.0.0-rc0~13^2~8 only helps clone/fetch.

> A sidenote: I wonder if bitmaps should be the default for bare
> repos, since bare repos are likely used on servers.
>
>> but will help clients that need
>> +to talk to those servers.
>> +
>> +- config: "repack.writeBitmaps=true" (see
>> +  linkgit:git-config[1]). Spend more time during repack to produce
>> +  bitmap index, helps clients with "fetch" & "clone" performance.


[GSoC][PATCH v5] t2027: avoid using pipes

2017-04-03 Thread Prathamesh Chavan
Whenever a git command is present in the upstream of a pipe, its failure
gets masked by piping. Hence we should avoid it for testing the
upstream git command. By writing out the output of the git command to
a file, we can test the exit codes of both the commands as a failure exit
code in any command is able to stop the && chain.

Signed-off-by: Prathamesh Chavan 
---

In this new version of the patch, I resolved the grammar mistakes
from the commit message. Thanks for pointing it out.

 t/t2027-worktree-list.sh | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/t/t2027-worktree-list.sh b/t/t2027-worktree-list.sh
index 848da5f36..720063bf0 100755
--- a/t/t2027-worktree-list.sh
+++ b/t/t2027-worktree-list.sh
@@ -20,7 +20,7 @@ test_expect_success 'rev-parse --git-common-dir on main 
worktree' '
 
 test_expect_success 'rev-parse --git-path objects linked worktree' '
echo "$(git rev-parse --show-toplevel)/.git/objects" >expect &&
-   test_when_finished "rm -rf linked-tree && git worktree prune" &&
+   test_when_finished "rm -rf linked-tree actual expect && git worktree 
prune" &&
git worktree add --detach linked-tree master &&
git -C linked-tree rev-parse --git-path objects >actual &&
test_cmp expect actual
@@ -28,19 +28,21 @@ test_expect_success 'rev-parse --git-path objects linked 
worktree' '
 
 test_expect_success '"list" all worktrees from main' '
echo "$(git rev-parse --show-toplevel) $(git rev-parse --short HEAD) 
[$(git symbolic-ref --short HEAD)]" >expect &&
-   test_when_finished "rm -rf here && git worktree prune" &&
+   test_when_finished "rm -rf here out actual expect && git worktree 
prune" &&
git worktree add --detach here master &&
echo "$(git -C here rev-parse --show-toplevel) $(git rev-parse --short 
HEAD) (detached HEAD)" >>expect &&
-   git worktree list | sed "s/  */ /g" >actual &&
+   git worktree list >out &&
+   sed "s/  */ /g" actual &&
test_cmp expect actual
 '
 
 test_expect_success '"list" all worktrees from linked' '
echo "$(git rev-parse --show-toplevel) $(git rev-parse --short HEAD) 
[$(git symbolic-ref --short HEAD)]" >expect &&
-   test_when_finished "rm -rf here && git worktree prune" &&
+   test_when_finished "rm -rf here out actual expect && git worktree 
prune" &&
git worktree add --detach here master &&
echo "$(git -C here rev-parse --show-toplevel) $(git rev-parse --short 
HEAD) (detached HEAD)" >>expect &&
-   git -C here worktree list | sed "s/  */ /g" >actual &&
+   git -C here worktree list >out &&
+   sed "s/  */ /g" actual &&
test_cmp expect actual
 '
 
@@ -49,7 +51,7 @@ test_expect_success '"list" all worktrees --porcelain' '
echo "HEAD $(git rev-parse HEAD)" >>expect &&
echo "branch $(git symbolic-ref HEAD)" >>expect &&
echo >>expect &&
-   test_when_finished "rm -rf here && git worktree prune" &&
+   test_when_finished "rm -rf here actual expect && git worktree prune" &&
git worktree add --detach here master &&
echo "worktree $(git -C here rev-parse --show-toplevel)" >>expect &&
echo "HEAD $(git rev-parse HEAD)" >>expect &&
@@ -69,16 +71,17 @@ test_expect_success 'bare repo setup' '
 '
 
 test_expect_success '"list" all worktrees from bare main' '
-   test_when_finished "rm -rf there && git -C bare1 worktree prune" &&
+   test_when_finished "rm -rf there out actual expect && git -C bare1 
worktree prune" &&
git -C bare1 worktree add --detach ../there master &&
echo "$(pwd)/bare1 (bare)" >expect &&
echo "$(git -C there rev-parse --show-toplevel) $(git -C there 
rev-parse --short HEAD) (detached HEAD)" >>expect &&
-   git -C bare1 worktree list | sed "s/  */ /g" >actual &&
+   git -C bare1 worktree list >out &&
+   sed "s/  */ /g" actual &&
test_cmp expect actual
 '
 
 test_expect_success '"list" all worktrees --porcelain from bare main' '
-   test_when_finished "rm -rf there && git -C bare1 worktree prune" &&
+   test_when_finished "rm -rf there actual expect && git -C bare1 worktree 
prune" &&
git -C bare1 worktree add --detach ../there master &&
echo "worktree $(pwd)/bare1" >expect &&
echo "bare" >>expect &&
@@ -92,11 +95,12 @@ test_expect_success '"list" all worktrees --porcelain from 
bare main' '
 '
 
 test_expect_success '"list" all worktrees from linked with a bare main' '
-   test_when_finished "rm -rf there && git -C bare1 worktree prune" &&
+   test_when_finished "rm -rf there out actual expect && git -C bare1 
worktree prune" &&
git -C bare1 worktree add --detach ../there master &&
echo "$(pwd)/bare1 (bare)" >expect &&
echo "$(git -C there rev-parse --show-toplevel) $(git -C there 
rev-parse --short HEAD) (detached HEAD)" >>expect &&
-   git -C there worktree list | 

Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-03 Thread Eric Wong
Ævar Arnfjörð Bjarmason  wrote:
> Add a new manpage that gives an overview of how to tweak git's
> performance.
> 
> There's currently no good single resource for things a git site
> administrator might want to look into to improve performance for his
> site & his users. This unfinished documentation aims to be the first
> thing someone might want to look at when investigating ways to improve
> git performance.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> 
> I've been wanting to get something like this started for a while. It's
> obviously woefully incomplete. Pointers about what to include would be
> great & whether including something like this makes sense.

Thanks for doing this.  I hope something like this can give
server operators more confidence to host their own git servers.

> Things I have on my TODO list:



>  - Should we be covering good practices for your repo going forward to
>maintain good performance? E.g. don't have some huge tree all in
>one directory (use subdirs), don't add binary (rather
>un-delta-able) content if you can help it etc.

Yes, I think so.

I think avoiding ever growing ChangeLog-type files should also
be added to things to avoid.

> --- /dev/null
> +++ b/Documentation/gitperformance.txt
> @@ -0,0 +1,107 @@
> +giteveryday(7)

gitperformance(7)

> +Server options to help clients
> +~~
> +
> +These features can be enabled on git servers, they won't help the
> +performance of the servers themselves,

Is that true for bitmaps?  I thought they reduced CPU usage on
the server side...

A sidenote: I wonder if bitmaps should be the default for bare
repos, since bare repos are likely used on servers.

> but will help clients that need
> +to talk to those servers.
> +
> +- config: "repack.writeBitmaps=true" (see
> +  linkgit:git-config[1]). Spend more time during repack to produce
> +  bitmap index, helps clients with "fetch" & "clone" performance.


[PATCH/RFC] gitperformance: add new documentation about git performance tuning

2017-04-03 Thread Ævar Arnfjörð Bjarmason
Add a new manpage that gives an overview of how to tweak git's
performance.

There's currently no good single resource for things a git site
administrator might want to look into to improve performance for his
site & his users. This unfinished documentation aims to be the first
thing someone might want to look at when investigating ways to improve
git performance.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

I've been wanting to get something like this started for a while. It's
obviously woefully incomplete. Pointers about what to include would be
great & whether including something like this makes sense.

Things I have on my TODO list:

 - Add a section discussing how refs impact performance, suggest
   e.g. archiving old tags if possible, or at least run "git remote
   prune origin" regularly on clients.

 - Discuss split index a bit, although I'm not very confident in
   describing what its pros & cons are.

 - Should we be covering good practices for your repo going forward to
   maintain good performance? E.g. don't have some huge tree all in
   one directory (use subdirs), don't add binary (rather
   un-delta-able) content if you can help it etc.

- The new core.checksumIndex option being discussed on-list. Which
  actually drove my to finally write this up (hrm, this sounds useful,
  but unless I was watching the list I'd probably never see it...).


 Documentation/Makefile   |   1 +
 Documentation/gitperformance.txt | 107 +++
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/gitperformance.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..528aa22354 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -23,6 +23,7 @@ MAN5_TXT += gitrepository-layout.txt
 MAN5_TXT += gitweb.conf.txt
 
 MAN7_TXT += gitcli.txt
+MAN7_TXT += gitperformance.txt
 MAN7_TXT += gitcore-tutorial.txt
 MAN7_TXT += gitcredentials.txt
 MAN7_TXT += gitcvs-migration.txt
diff --git a/Documentation/gitperformance.txt b/Documentation/gitperformance.txt
new file mode 100644
index 00..0548d1e721
--- /dev/null
+++ b/Documentation/gitperformance.txt
@@ -0,0 +1,107 @@
+giteveryday(7)
+==
+
+NAME
+
+gitperformance - How to improve Git's performance
+
+SYNOPSIS
+
+
+A guide to improving Git's performance beyond the defaults.
+
+DESCRIPTION
+---
+
+Git is mostly performant by default, but ships with various
+configuration options, command-line options, etc. that might improve
+performance, but for various reasons aren't on by default.
+
+This document provides a brief overview of these features.
+
+The reader should not assume that turning on all of these features
+will increase performance, depending on the repository, workload &
+use-case turning some of them on might severely harm performance.
+
+This document serves as a starting point for things to look into when
+it comes to improving performance, not as a checklist for things to
+enable or disable.
+
+Performance by topic
+
+
+It can be hard to divide the performance features into topics, but
+most of them fall into various well-defined buckets. E.g. there are
+features that help with the performance of "git status", and couldn't
+possibly impact repositories without working copies, and then some
+that only impact the performance of cloning from a server, or help the
+server itself etc.
+
+git status
+~~
+
+Running "git status" requires traversing the working tree & comparing
+it with the index. Several configuration options can help with its
+performance, with some trade-offs.
+
+- config: "core.untrackedCache=true" (see linkgit:git-config[1]) can
+  save on `stat(2)` calls by caching the mtime of filesystem
+  directories, and if they didn't change avoid recursing into that
+  directory to `stat(2)` every file in them.
++
+pros: Can drastically speed up "git status".
++
+cons: There's a speed hit for initially populating & maintaining the
+cache. Doesn't work on all filesystems (see `--test-untracked-cache`
+in linkgit:git-update-index[1]).
+
+- config: "status.showUntrackedFiles=no" (see
+  linkgit:git-config[1]). Skips looking for files in the working tree
+  git doesn't already know about.
++
+pros: Speeds up "git status" by making it do a lot less work.
++
+cons: If there's any new & untracked files anywhere in the working
+tree they won't be noticed by git. Makes it easy to accidentally miss
+files to "git add" before committing, or files which might impact the
+code in the working tree, but which git won't know exist.
+
+git grep
+
+
+- config: "grep.patternType=perl" (see linkgit:git-config[1]) will use
+  the PCRE library when "git grep" is invoked by default. This can be
+  faster than POSIX regular expressions in many cases.
++
+pros: Can, depending on the use-case, be faster than default "git grep".
++
+cons: Can also be slower, and in some edge cases produce different

Re: [PATCH v4 2/4] fsck: force core.checksumindex=1

2017-04-03 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 3, 2017 at 8:53 PM,   wrote:
> Teach fsck to override core.checksumindex and always verify
> the index checksum when reading the index.

Sorry to only chime in about this at v4.

I think this patch & the documentation you added for
core.checksumindex in 1/4 would be much less confusing if you made
this a on-by-default command-line option like e.g. "full".

With this patch nothing amends the documentation to indicate that the
core.checksumindex is magically overridden when fsck runs, I think
something like this (needs amending to integrate) on top would make
this clearer:

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..19b45b1b6f 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -12,7 +12,7 @@ SYNOPSIS
 'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]
 [--[no-]full] [--strict] [--verbose] [--lost-found]
 [--[no-]dangling] [--[no-]progress] [--connectivity-only]
-[--[no-]name-objects] [*]
+[--[no-]name-objects] [--[no-]checksum-index] [*]

 DESCRIPTION
 ---
@@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
namespace, and all reflogs
object pools.  This is now default; you can turn it off
with --no-full.

+--[no-]checksum-index:
+   Validate the checksum at the end of the index file, on by
+   default, locally overrides any "core.checksumIndex" setting
+   unless negated. See linkgit:git-config[1].
+
 --connectivity-only::
Check only the connectivity of tags, commits and tree objects. By
avoiding to unpack blobs, this speeds up the operation, at the
diff --git a/builtin/fsck.c b/builtin/fsck.c
index f76e4163ab..8fe8ec1775 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -24,6 +24,7 @@ static int show_tags;
 static int show_unreachable;
 static int include_reflogs = 1;
 static int check_full = 1;
+static int fsck_opt_checksum_index = 1;
 static int connectivity_only;
 static int check_strict;
 static int keep_cache_objects;
@@ -656,6 +657,8 @@ static struct option fsck_opts[] = {
OPT_BOOL(0, "cache", _cache_objects, N_("make index
objects head nodes")),
OPT_BOOL(0, "reflogs", _reflogs, N_("make reflogs head
nodes (default)")),
OPT_BOOL(0, "full", _full, N_("also consider packs and
alternate objects")),
+   OPT_BOOL(0, "checksum-index", _opt_checksum_index,
+N_("validate the checksum at the end of the index file")),
OPT_BOOL(0, "connectivity-only", _only, N_("check
only connectivity")),
OPT_BOOL(0, "strict", _strict, N_("enable more strict checking")),
OPT_BOOL(0, "lost-found", _lost_and_found,
@@ -681,6 +684,9 @@ int cmd_fsck(int argc, const char **argv, const
char *prefix)
if (check_strict)
fsck_obj_options.strict = 1;

+   if (fsck_opt_checksum_index)
+   force_core_checksum_index = 1;
+
if (show_progress == -1)
show_progress = isatty(2);
if (verbose)


Re: [GSoC][PATCH v4] t2027: avoid using pipes

2017-04-03 Thread Stefan Beller
On Fri, Mar 24, 2017 at 5:04 AM, Prathamesh Chavan  wrote:
> Whenever a git command is present in the upstream of a pipe, its failure
> gets masked by piping and hence it should be avoided for testing the
> upstream git command. By writing out the output of the git command to
> a file, we can test the exit codes of both the commands as a failure exit
> code in any command is able to stop the && chain.

Sorry for dropping the ball here.
This patch is identical in code that Torsten reviewed and has a
slight grammar fix in the commit message.

> Signed-off-by: Prathamesh Chavan 

I reviewed the patch it it looks fine.

Thanks,
Stefan


Re: [PATCH v4] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread Torsten Bögershausen
On 03.04.17 19:30, David Turner wrote:
> Unfortunately, in order to push some large repos, the http postbuffer
> must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> we just malloc a larger buffer.
> 
> This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
> buffer size.
> 
> Signed-off-by: David Turner 
> ---
>  cache.h   |  1 +
>  config.c  | 17 +
>  http.c|  4 ++--
>  http.h|  2 +-
>  remote-curl.c |  6 +++---
>  5 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index fbdf7a815a..5e6747dbb4 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
>  extern int git_config_int(const char *, const char *);
>  extern int64_t git_config_int64(const char *, const char *);
>  extern unsigned long git_config_ulong(const char *, const char *);
> +extern ssize_t git_config_ssize_t(const char *, const char *);
>  extern int git_config_bool_or_int(const char *, const char *, int *);
>  extern int git_config_bool(const char *, const char *);
>  extern int git_config_maybe_bool(const char *, const char *);
> diff --git a/config.c b/config.c
> index 1a4d85537b..de5b155a4e 100644
> --- a/config.c
> +++ b/config.c
> @@ -834,6 +834,15 @@ int git_parse_ulong(const char *value, unsigned long 
> *ret)
>   return 1;
>  }
>  
> +static int git_parse_ssize_t(const char *value, ssize_t *ret)
> +{
> + ssize_t tmp;
> + if (!git_parse_signed(value, , 
> maximum_signed_value_of_type(ssize_t)))
> + return 0;
> + *ret = tmp;
> + return 1;
> +}
> +
>  NORETURN
>  static void die_bad_number(const char *name, const char *value)
>  {
> @@ -892,6 +901,14 @@ unsigned long git_config_ulong(const char *name, const 
> char *value)
>   return ret;
>  }
>  
> +ssize_t git_config_ssize_t(const char *name, const char *value)
> +{
> + ssize_t ret;
> + if (!git_parse_ssize_t(value, ))
> + die_bad_number(name, value);
> + return ret;
> +}
> +
>  int git_parse_maybe_bool(const char *value)
>  {
>   if (!value)
> diff --git a/http.c b/http.c
> index 96d84bbed3..22f8167ba2 100644
> --- a/http.c
> +++ b/http.c
> @@ -19,7 +19,7 @@ long int git_curl_ipresolve;
>  #endif
>  int active_requests;
>  int http_is_verbose;
> -size_t http_post_buffer = 16 * LARGE_PACKET_MAX;
> +ssize_t http_post_buffer = 16 * LARGE_PACKET_MAX;
>  
>  #if LIBCURL_VERSION_NUM >= 0x070a06
>  #define LIBCURL_CAN_HANDLE_AUTH_ANY
> @@ -331,7 +331,7 @@ static int http_options(const char *var, const char 
> *value, void *cb)
>   }
>  
>   if (!strcmp("http.postbuffer", var)) {
> - http_post_buffer = git_config_int(var, value);
> + http_post_buffer = git_config_ssize_t(var, value);
>   if (http_post_buffer < LARGE_PACKET_MAX)
>   http_post_buffer = LARGE_PACKET_MAX;
>   return 0;
> diff --git a/http.h b/http.h
> index 02bccb7b0c..f7bd3b26b0 100644
> --- a/http.h
> +++ b/http.h
> @@ -111,7 +111,7 @@ extern struct curl_slist *http_copy_default_headers(void);
>  extern long int git_curl_ipresolve;
>  extern int active_requests;
>  extern int http_is_verbose;
> -extern size_t http_post_buffer;
> +extern ssize_t http_post_buffer;
>  extern struct credential http_auth;
>  
>  extern char curl_errorstr[CURL_ERROR_SIZE];
> diff --git a/remote-curl.c b/remote-curl.c
> index e953d06f66..69b4d71e4c 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -614,7 +614,7 @@ static int post_rpc(struct rpc_state *rpc)
>* and we just need to send it.
>*/
>   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
> - curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, gzip_size);
> + curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, 
> (curl_off_t) gzip_size);

Is this cast safe, or can it silently truncate ?
Or is it more save to die() in this case, similar to ehat we do in 
git-compat-util.h?

static inline size_t xsize_t(off_t len)
{
if (len > (size_t) len)
die("Cannot handle files this big");
return (size_t)len;
}


 
>  
>   } else if (use_gzip && 1024 < rpc->len) {
>   /* The client backend isn't giving us compressed data so
> @@ -645,7 +645,7 @@ static int post_rpc(struct rpc_state *rpc)
>  
>   headers = curl_slist_append(headers, "Content-Encoding: gzip");
>   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
> - curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, gzip_size);
> + curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, 
> (curl_off_t) gzip_size);
>  
>   if (options.verbosity > 1) {
>   fprintf(stderr, "POST %s (gzip %lu to %lu bytes)\n",
> @@ -658,7 +658,7 @@ static int post_rpc(struct rpc_state *rpc)
>* more normal 

[PATCH v4 3/4] t1450-fsck: test core.checksumindex

2017-04-03 Thread git
From: Jeff Hostetler 

Add test for fsck to force verify the index checksum.
Add test to demonstrate that status does ignore the checksum
when reading the index.

Signed-off-by: Jeff Hostetler 
---
 t/t1450-fsck.sh | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..a33d542 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,31 @@ test_expect_success 'bogus head does not fallback to all 
heads' '
! grep $blob out
 '
 
+test_expect_success 'validate force_core_checksum_index=1' '
+   git fsck --cache
+'
+
+test_expect_success PERL 'detect index file corrupt in fsck' '
+   cp .git/index .git/index.backup &&
+   echo  > &&
+   git add  &&
+   perl -pi -e "s///" .git/index &&
+   test_must_fail git fsck --cache &&
+   rm .git/index &&
+   mv .git/index.backup .git/index &&
+   rm 
+'
+
+test_expect_success PERL 'verify status ignores corrupt checksum' '
+   cp .git/index .git/index.backup &&
+   echo  > &&
+   git add  &&
+   perl -pi -e "s///" .git/index &&
+   git -c core.checksumindex=0 status &&
+   # Status may fix the checksum
+   rm .git/index &&
+   mv .git/index.backup .git/index &&
+   rm 
+'
+
 test_done
-- 
2.9.3



[PATCH v4 1/4] read-cache: core.checksumindex

2017-04-03 Thread git
From: Jeff Hostetler 

Teach git to skip verification of the SHA-1 checksum at the end of
the index file in verify_hdr() called from read_index() when the
core.checksumIndex configuration variable is set to false.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

On the Linux kernel repository, the effect is rather trivial.
The time to reading its index with 58k entries drops from 0.0284 sec
down to 0.0155 sec.

On my Windows source tree (450MB index), I'm seeing a savings of 0.6
seconds -- read_index() went from 1.2 to 0.6 seconds.

Signed-off-by: Jeff Hostetler 
Signed-off-by: Junio C Hamano 
---
 Documentation/config.txt |  8 
 read-cache.c | 12 
 2 files changed, 20 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 47603f5..0f72bdd 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -267,6 +267,14 @@ advice.*::
show directions on how to proceed from the current state.
 --
 
+core.checksumIndex::
+   Tell Git to validate the checksum at the end of the index
+   file to detect corruption.  Defaults to `true`.  Those who
+   work on a project with too many files may want to set this
+   variable to `false` to make it faster to load the index (in
+   exchange for reliability, but in general modern disks are
+   reliable enough for most people).
+
 core.fileMode::
Tells Git if the executable bit of files in the working tree
is to be honored.
diff --git a/read-cache.c b/read-cache.c
index 9054369..dd64cde 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1376,12 +1376,24 @@ static int verify_hdr(struct cache_header *hdr, 
unsigned long size)
git_SHA_CTX c;
unsigned char sha1[20];
int hdr_version;
+   int do_checksum = 1; /* default to true for now */
 
if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
return error("bad signature");
hdr_version = ntohl(hdr->hdr_version);
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
+
+   /*
+* Since we run very early in command startup, git_config()
+* may not have been called yet and the various "core_*"
+* global variables haven't been set.  So look it up
+* explicitly.
+*/
+   git_config_get_bool("core.checksumindex", _checksum);
+   if (!do_checksum)
+   return 0;
+
git_SHA1_Init();
git_SHA1_Update(, hdr, size - 20);
git_SHA1_Final(sha1, );
-- 
2.9.3



[PATCH v4 0/4] read-cache: call verify_hdr() in a background thread

2017-04-03 Thread git
From: Jeff Hostetler 

Version 4 of this patch series incorporates the cleanup made by Junio
to my 3rd veresion, updates fsck to always verify the checksum, and a
new simplified perf test using p0002 as discussed on the mailing list.


Version 3 of this patch series simplifies this effort to just turn
on/off the hash verification using a "core.checksumindex" config variable.

I've preserved the original checksum validation code so that we can
force it on in fsck if desired.

It eliminates the original threading model completely.

Jeff Hostetler (4):
  read-cache: core.checksumindex
  fsck: force core.checksumindex=1
  t1450-fsck: test core.checksumindex
  p0002-read-cache: test core.checksumindex

 Documentation/config.txt   |  8 
 builtin/fsck.c |  1 +
 cache.h|  7 +++
 read-cache.c   | 18 ++
 t/perf/p0002-read-cache.sh |  8 +++-
 t/t1450-fsck.sh| 27 +++
 6 files changed, 68 insertions(+), 1 deletion(-)

-- 
2.9.3



[PATCH v4 4/4] p0002-read-cache: test core.checksumindex

2017-04-03 Thread git
From: Jeff Hostetler 

Teach t/perf/p0002-read-cache to time read_cache() with
and without the index checksum calculation.

Signed-off-by: Jeff Hostetler 
---
 t/perf/p0002-read-cache.sh | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/perf/p0002-read-cache.sh b/t/perf/p0002-read-cache.sh
index 9180ae9..71feacd 100755
--- a/t/perf/p0002-read-cache.sh
+++ b/t/perf/p0002-read-cache.sh
@@ -7,7 +7,13 @@ test_description="Tests performance of reading the index"
 test_perf_default_repo
 
 count=1000
-test_perf "read_cache/discard_cache $count times" "
+test_perf "read_cache/discard_cache checksum=1 $count times" "
+   git config --local core.checksumindex 1 &&
+   test-read-cache $count
+"
+
+test_perf "read_cache/discard_cache checksum=0 $count times" "
+   git config --local core.checksumindex 0 &&
test-read-cache $count
 "
 
-- 
2.9.3



[PATCH v4 2/4] fsck: force core.checksumindex=1

2017-04-03 Thread git
From: Jeff Hostetler 

Teach fsck to override core.checksumindex and always verify
the index checksum when reading the index.

Signed-off-by: Jeff Hostetler 
---
 builtin/fsck.c |  1 +
 cache.h|  7 +++
 read-cache.c   | 20 +---
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1a5cacc..6913233 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -771,6 +771,7 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
 
if (keep_cache_objects) {
+   force_core_checksum_index = 1;
read_cache();
for (i = 0; i < active_nr; i++) {
unsigned int mode;
diff --git a/cache.h b/cache.h
index 80b6372..3ebda0a 100644
--- a/cache.h
+++ b/cache.h
@@ -685,6 +685,13 @@ extern void update_index_if_able(struct index_state *, 
struct lock_file *);
 extern int hold_locked_index(struct lock_file *, int);
 extern void set_alternate_index_output(const char *);
 
+/*
+ * Override "core.checksumindex" config settings.  Allows commands
+ * like "fsck" to force it without altering on-disk settings in case
+ * routines call die() before it can be reset.
+ */
+extern int force_core_checksum_index;
+
 /* Environment bits from configuration mechanism */
 extern int trust_executable_bit;
 extern int trust_ctime;
diff --git a/read-cache.c b/read-cache.c
index dd64cde..36fdc2a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1371,6 +1371,8 @@ struct ondisk_cache_entry_extended {
ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
ondisk_cache_entry_size(ce_namelen(ce)))
 
+int force_core_checksum_index;
+
 static int verify_hdr(struct cache_header *hdr, unsigned long size)
 {
git_SHA_CTX c;
@@ -1384,13 +1386,17 @@ static int verify_hdr(struct cache_header *hdr, 
unsigned long size)
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
 
-   /*
-* Since we run very early in command startup, git_config()
-* may not have been called yet and the various "core_*"
-* global variables haven't been set.  So look it up
-* explicitly.
-*/
-   git_config_get_bool("core.checksumindex", _checksum);
+   if (force_core_checksum_index)
+   do_checksum = 1;
+   else {
+   /*
+* Since we run very early in command startup, git_config()
+* may not have been called yet and the various "core_*"
+* global variables haven't been set.  So look it up
+* explicitly.
+*/
+   git_config_get_bool("core.checksumindex", _checksum);
+   }
if (!do_checksum)
return 0;
 
-- 
2.9.3



Re: [PATCH v7 4/5] dir_iterator: refactor state machine model

2017-04-03 Thread Daniel Ferreira (theiostream)
On Mon, Apr 3, 2017 at 12:36 AM, Michael Haggerty  wrote:
> As far as I can tell, you got the logic in this complicated big loop
> correct on the first try (well, if we ignore v6 :-) ), even as you added
> new features. I think that's good evidence that the new structure is
> more comprehensible than the old, plus the new tests probably helped.
> That's a big win!

Thanks for ignoring v6 ;)

Another gain is that the other proposed features (only iterate over
directories, do not recurse into subdirectories) are also quite easy
to add with this new structure.

> It's not ideal that the test code depends on the numerical values of the
> flag constants, and it makes the tests harder to understand. It would be
> better if this program were to accept options like `--pre-order`,
> `--post-order`, etc., as I suggested in an earlier round of review.

Although it does make tests harder to understand, if we were to
specify how to iterate with human-readable flags we'd add the getopt()
+ flag configuration overhead to this helper program to be able to
handle all cases properly. Additionally, new flags added to
dir_iterator would have to edit the test program as well, generating
extra work.

I personally think that the string describing the test in the script
is enough to explain what the flag-as-argument is doing for the sake
of readability. The only gain I see in your suggestion is that we
avoid the programmer committing errors calculating the flag by hand
when writing the test.

That said, I'd appreciate some more thought on this.

> Michael
>

Thanks for the review. I agree with all other points and I'll address
them in a next series.


Re: [PATCH] Fix 'git am' in-body header continuations

2017-04-03 Thread Jonathan Tan

This looks good to me.

On 04/02/2017 05:49 PM, Linus Torvalds wrote:


From: Linus Torvalds 
Date: Sat, 1 Apr 2017 12:14:39 -0700
Subject: [PATCH] Fix 'git am' in-body header continuations

An empty line should stop any pending in-body headers, and start the
actual body parsing.

This also modifies the original test for the in-body headers to actually
have a real commit body that starts with spaces, and changes the test to
check that the long line matches _exactly_, and doesn't get extra data
from the body.

Fixes:6b4b013f1884 ("mailinfo: handle in-body header continuations")
Cc: Jonathan Tan 
Cc: Jeff King 
Signed-off-by: Linus Torvalds 
---
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index 89a5bacac..44807e218 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -983,7 +983,9 @@ test_expect_success 'am works with multi-line in-body 
headers' '
rm -fr .git/rebase-apply &&
git checkout -f first &&
echo one >> file &&
-   git commit -am "$LONG" --author="$LONG " &&
+   git commit -am "$LONG
+
+Body test" --author="$LONG " &&


Instead of "Body test", I would write something more descriptive like 
"Not a continuation line because of blank line above", but I'm fine with 
either.



git format-patch --stdout -1 >patch &&
# bump from, date, and subject down to in-body header
perl -lpe "
@@ -997,7 +999,7 @@ test_expect_success 'am works with multi-line in-body 
headers' '
git am msg &&
# Ensure that the author and full message are present
git cat-file commit HEAD | grep "^author.*l...@example.com" &&
-   git cat-file commit HEAD | grep "^$LONG"
+   git cat-file commit HEAD | grep "^$LONG$"
 '

 test_done



Re: Bug in "git am" when the body starts with spaces

2017-04-03 Thread Jonathan Tan

Thanks, everyone, for looking into this.

On 04/01/2017 09:18 PM, Jeff King wrote:

On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote:

The logic is fairly simple: if we encounter an empty line, and we have
pending in-body headers, we flush the pending headers, and mark us as
no longer in header mode.


Hmm. I think this may work. At first I thought it was too strict in
always checking inbody_header_accum.len, because we want this to kick in
always, whether there's whitespace continuation or not. But that
accumulator has to collect preemptively, before it knows if there's
continuation. So it will always be non-empty if we've seen _any_ header,
and it will remain non-empty as long as we keep parsing (because any
time we flush, we do so in order to handle another line).

IOW, I think this implements the state-machine thing I wrote in my
earlier email, because the state "are we inside in-body header parsing"
is always reflected by having a non-empty accumulator. It is a bit
non-obvious though.


About obviousness, I think of a non-empty accumulator merely 
representing that the next line could potentially be a continuation 
line. And it is coincidence that this implies "are we inside in-body 
header parsing"; if not all in-body header lines could be "continued", 
there would be no such implication.


mi->inbody_header_accum.len is already used in check_inbody_header to 
mean "could the next line potentially be a continuation line" and to 
trigger a check for a negative criterion (in this case, a scissors 
line). I think it's fine to do the same thing, the negative criterion 
here being a blank line.


[PATCH v4] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread David Turner
Unfortunately, in order to push some large repos, the http postbuffer
must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
we just malloc a larger buffer.

This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
buffer size.

Signed-off-by: David Turner 
---
 cache.h   |  1 +
 config.c  | 17 +
 http.c|  4 ++--
 http.h|  2 +-
 remote-curl.c |  6 +++---
 5 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index fbdf7a815a..5e6747dbb4 100644
--- a/cache.h
+++ b/cache.h
@@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
 extern int git_config_int(const char *, const char *);
 extern int64_t git_config_int64(const char *, const char *);
 extern unsigned long git_config_ulong(const char *, const char *);
+extern ssize_t git_config_ssize_t(const char *, const char *);
 extern int git_config_bool_or_int(const char *, const char *, int *);
 extern int git_config_bool(const char *, const char *);
 extern int git_config_maybe_bool(const char *, const char *);
diff --git a/config.c b/config.c
index 1a4d85537b..de5b155a4e 100644
--- a/config.c
+++ b/config.c
@@ -834,6 +834,15 @@ int git_parse_ulong(const char *value, unsigned long *ret)
return 1;
 }
 
+static int git_parse_ssize_t(const char *value, ssize_t *ret)
+{
+   ssize_t tmp;
+   if (!git_parse_signed(value, , 
maximum_signed_value_of_type(ssize_t)))
+   return 0;
+   *ret = tmp;
+   return 1;
+}
+
 NORETURN
 static void die_bad_number(const char *name, const char *value)
 {
@@ -892,6 +901,14 @@ unsigned long git_config_ulong(const char *name, const 
char *value)
return ret;
 }
 
+ssize_t git_config_ssize_t(const char *name, const char *value)
+{
+   ssize_t ret;
+   if (!git_parse_ssize_t(value, ))
+   die_bad_number(name, value);
+   return ret;
+}
+
 int git_parse_maybe_bool(const char *value)
 {
if (!value)
diff --git a/http.c b/http.c
index 96d84bbed3..22f8167ba2 100644
--- a/http.c
+++ b/http.c
@@ -19,7 +19,7 @@ long int git_curl_ipresolve;
 #endif
 int active_requests;
 int http_is_verbose;
-size_t http_post_buffer = 16 * LARGE_PACKET_MAX;
+ssize_t http_post_buffer = 16 * LARGE_PACKET_MAX;
 
 #if LIBCURL_VERSION_NUM >= 0x070a06
 #define LIBCURL_CAN_HANDLE_AUTH_ANY
@@ -331,7 +331,7 @@ static int http_options(const char *var, const char *value, 
void *cb)
}
 
if (!strcmp("http.postbuffer", var)) {
-   http_post_buffer = git_config_int(var, value);
+   http_post_buffer = git_config_ssize_t(var, value);
if (http_post_buffer < LARGE_PACKET_MAX)
http_post_buffer = LARGE_PACKET_MAX;
return 0;
diff --git a/http.h b/http.h
index 02bccb7b0c..f7bd3b26b0 100644
--- a/http.h
+++ b/http.h
@@ -111,7 +111,7 @@ extern struct curl_slist *http_copy_default_headers(void);
 extern long int git_curl_ipresolve;
 extern int active_requests;
 extern int http_is_verbose;
-extern size_t http_post_buffer;
+extern ssize_t http_post_buffer;
 extern struct credential http_auth;
 
 extern char curl_errorstr[CURL_ERROR_SIZE];
diff --git a/remote-curl.c b/remote-curl.c
index e953d06f66..69b4d71e4c 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -614,7 +614,7 @@ static int post_rpc(struct rpc_state *rpc)
 * and we just need to send it.
 */
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
-   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, gzip_size);
+   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, 
(curl_off_t) gzip_size);
 
} else if (use_gzip && 1024 < rpc->len) {
/* The client backend isn't giving us compressed data so
@@ -645,7 +645,7 @@ static int post_rpc(struct rpc_state *rpc)
 
headers = curl_slist_append(headers, "Content-Encoding: gzip");
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
-   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, gzip_size);
+   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, 
(curl_off_t) gzip_size);
 
if (options.verbosity > 1) {
fprintf(stderr, "POST %s (gzip %lu to %lu bytes)\n",
@@ -658,7 +658,7 @@ static int post_rpc(struct rpc_state *rpc)
 * more normal Content-Length approach.
 */
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, rpc->buf);
-   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
+   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, 
(curl_off_t) rpc->len);
if (options.verbosity > 1) {
fprintf(stderr, "POST %s (%lu bytes)\n",
rpc->service_name, (unsigned long)rpc->len);
-- 
2.11.GIT



Re: [PATCH v3 3/4] name-rev: provide debug output

2017-04-03 Thread Junio C Hamano
Michael J Gruber  writes:

> No, I checked not to change the existing behaviour.
>
> If you look at the comment above that then you see that one of the sides
> of the comparison is a non-tag, so we compare 0 to 1 or 2, the boolean
> outcome being the same.

That one I understand, but if you compare 1 and 2 (one side being a
lightweight, the other being an annotated tag), they no longer
compare equal, no?


RE: [PATCH v3] http.postbuffer: allow full range of ssize_t values

2017-04-03 Thread David Turner


> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Saturday, April 1, 2017 2:01 AM
> To: David Turner 
> Cc: git@vger.kernel.org
> Subject: Re: [PATCH v3] http.postbuffer: allow full range of ssize_t values
> 
> On Fri, Mar 31, 2017 at 01:26:31PM -0400, David Turner wrote:
> 
> > Unfortunately, in order to push some large repos, the http postbuffer
> > must sometimes exceed two gigabytes.  On a 64-bit system, this is OK:
> > we just malloc a larger buffer.
> 
> I'm still not sure why a 2GB post-buffer is necessary. It sounds like 
> something
> is broken in your setup. Large pushes should be sent chunked.
> 
> I know broken setups are a fact of life, but this feels like a really hacky 
> work-
> around.

I'm not sure what other workaround I should use.  I guess I could do multiple 
pushes, but only if individual objects are under the size limit, and I'm not 
sure all of mine are (maybe I'll get lucky tho).  I know that this is a 
configuration issue with gitlab: 
https://gitlab.com/gitlab-org/gitlab-ce/issues/30315 but I don't know when that 
will get fixed.  I could manually copy the repo to the server and do a local 
push, but I don't know that I have the necessary permissions to do that. Or I 
could do this, which would hopefully actually solve the problem.

> > diff --git a/cache.h b/cache.h
> > index fbdf7a815a..5e6747dbb4 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1900,6 +1900,7 @@ extern int git_parse_maybe_bool(const char *);
> > extern int git_config_int(const char *, const char *);  extern int64_t
> > git_config_int64(const char *, const char *);  extern unsigned long
> > git_config_ulong(const char *, const char *);
> > +extern ssize_t git_config_ssize_t(const char *, const char *);
> 
> For most of our other "big" values we use git_config_ulong(). E.g.,
> core.bigfilethreshold. I suspect that would be fine for your purposes here,
> though using size_t is more correct (on Windows "unsigned long" is still only
> 32 bits, even on 64-bit systems).
> 
> The ultimate fate of this number, though, is to be handed to:
> 
>   curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, rpc->len);
> 
> where the final argument is interpreted as a long. So I suspect that on 64-bit
> Windows, setting http.postbuffer to "3G" would cause some kind of weird
> error (either a truncated post or some internal curl error due to the negative
> size, depending on how curl handles it).

Ah, so we would need to use CURLOPT_POSTFIELDSIZE_LARGE.  Will re-roll.

> 
> I think a "git_config_long()" would probably do everything correctly.
> 
> > +static int git_parse_ssize_t(const char *value, ssize_t *ret) {
> > +   ssize_t tmp;
> > +   if (!git_parse_signed(value, ,
> maximum_signed_value_of_type(ssize_t)))
> > +   return 0;
> > +   *ret = tmp;
> > +   return 1;
> > +}
> 
> I saw the earlier iteration used a size_t, but you switched it after the 
> compiler
> (rightfully) complained about the signedness. But I'm not sure why we would
> want ssize_t here instead of just using git_parse_unsigned().

It was originally signed.  I'm not sure why that was, but I figured it would be 
simpler to save the extra bit just in case.


Re: Bug? git submodule update --reference doesn't use the referenced repository

2017-04-03 Thread Stefan Beller
On Sun, Apr 2, 2017 at 8:13 PM, Maxime Viargues
 wrote:
> Hi there,
>
> I have been trying to use the --reference option to clone a big repository
> using a local copy, but I can't manage to make it work using sub-module
> update. I believe this is a bug, unless I missed something.
> I am on Windows, Git 2.12.0

which is new enough, that the new --reference code is in. :)

>
> So the problem is as follow:
> - I have got a repository with multiple sub-modules, say
> main
> lib1
> sub-module1.git
> lib2
> sub-module2.git
> - The original repositories are in GitHub, which makes it slow
> - I have done a normal git clone of the entire repository (not bare) and put
> it on a file server, say \\fileserver\ref_repo\
> (Note that the problem also happens with local copy)
>
> So if I do a clone to get the repo and all the submodules with...
> git clone --reference-if-able \\fileserver\ref-repo --recursive
> g...@github.com:company/main
> ...then it all works, all the sub-modules get cloned and the it's fast.

great. :)

> Now in my case I am working with Jenkins jobs and I need to first do a
> clone, and then get the sub-modules, but if I do...
> git clone --reference-if-able \\fileserver\ref-repo
> g...@github.com:company/main (so non-recursive)
> cd main
> git submodule update --init --reference \\fileserver\ref-repo
> ... then this takes ages, as it would normally do without the use of
> --reference. I suspect it's not actually using it.

So to confirm your suspicion, can you run

  GIT_TRACE=1 git clone ...
  cd main && GIT_TRACE=1 git submodule update ...

to see which child processes are spawned to deal with the submodules?
Also to confirm, it is the "submodule update" that is taking so long for you?

> The git clone documentation mentions that the reference is then passed to
> the sub-module clone commands, so I would expect "git clone --recursive" to
> work the same as "git submodule update", as far as --reference is concerned.

Oh, there we have an opportunity to improve the man page (or the code).

git clone --reference --recursive ...

will set the config variables

git config submodule.alternateLocation superproject
git config submodule.alternateErrorStrategy die (or "info" for
--reference-if-able)

and the clone for the submodules (that are an independent process, just
run after the clone of the superproject is done) will pickup these
config variables
and act accordingly.

If you only run

git clone --reference ...

then these variables are not set. Probably they should be set such
that the later
invocation of "git submodule update --int" will behave the same as the git-clone
of the superproject did.

So as a workaround for you to get up to speed again, you can just set
these config
variables yourself before running the "submodule update --init" and it
should work.

>
> I noticed for a single module, doing a...
> git submodule update --init --reference
> \\fileserver\ref-repo\lib1\sub-module1 -- lib1/sub-module1
> ...i.e. adding the sub-module path to the reference path, works. Which kind
> of make sense but then how do you do to apply it to all the sub-modules?
> (without writing a script to do that)

I think that functionality is broken as it takes the same reference
for all submodules,
such that you need to go through the submodules one by one and give the
submodule specific reference location.

>
> If someone can confirm the problem or explain me what I am dong wrong that
> would be great.
>
> Maxime

Stefan


Re: [PATCH] pathspec: always honor `PATHSPEC_PREFIX_ORIGIN` flag

2017-04-03 Thread Brandon Williams
On 04/03, Patrick Steinhardt wrote:
> Previous to commit 5d8f084a5 (pathspec: simpler logic to prefix original
> pathspec elements, 2017-01-04), we were always using the computed
> `match` variable to perform pathspec matching whenever
> `PATHSPEC_PREFIX_ORIGIN` is set. This is for example useful when passing
> the parsed pathspecs to other commands, as the computed `match` may
> contain a pathspec relative to the repository root. The commit changed
> this logic to only do so when we do have an actual prefix and when
> literal pathspecs are deactivated.
> 
> But this change may actually break some commands which expect passed
> pathspecs to be relative to the repository root. One such case is `git
> add --patch`, which now fails when using relative paths from a
> subdirectory. For example if executing "git add -p ../foo.c" in a
> subdirectory, the `git-add--interactive` command will directly pass
> "../foo.c" to `git-ls-files`. As ls-files is executed at the
> repository's root, the command will notice that "../foo.c" is outside
> the repository and fail.
> 
> Fix the issue by again using the computed `match` variable whenever
> `PATHSPEC_PREFIX_ORIGIN` is set. This restores behavior previous to
> 5d8f084a5 and fixes interactive add.
> 
> Signed-off-by: Patrick Steinhardt 
> ---
>  pathspec.c |  6 +++---
>  t/t3701-add-interactive.sh | 20 
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/pathspec.c b/pathspec.c
> index 303efda83..3193e45a6 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -504,12 +504,12 @@ static void init_pathspec_item(struct pathspec_item 
> *item, unsigned flags,
>* Prefix the pathspec (keep all magic) and assign to
>* original. Useful for passing to another command.
>*/
> - if ((flags & PATHSPEC_PREFIX_ORIGIN) &&
> - prefixlen && !get_literal_global()) {
> + if (flags & PATHSPEC_PREFIX_ORIGIN) {
>   struct strbuf sb = STRBUF_INIT;
>  
>   /* Preserve the actual prefix length of each pattern */
> - prefix_magic(, prefixlen, element_magic);
> + if (prefixlen && !get_literal_global())
> + prefix_magic(, prefixlen, element_magic);
>  
>   strbuf_addstr(, match);
>   item->original = strbuf_detach(, NULL);

Would it just make sense to drop the requirement that prefixlen be
non-zero?  My problem with this change currently is the ability to get
an original string with is empty (ie "\0") which would cause git to
throw some warnings about not allowing empty strings as pathspecs if
they were then passed on to other processes.

> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
> index f9528fa00..640088dd6 100755
> --- a/t/t3701-add-interactive.sh
> +++ b/t/t3701-add-interactive.sh
> @@ -436,6 +436,26 @@ test_expect_success 'add -p handles globs' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'add -p handles relative paths' '
> + git reset --hard &&
> +
> + echo base >root.c &&
> + git add "*.c" &&
> + git commit -m base &&
> +
> + echo change >root.c &&
> + mkdir -p subdir &&
> + git -C subdir add -p "../root.c" <<-\EOF &&
> + y
> + EOF
> +
> + cat >expect <<-\EOF &&
> + root.c
> + EOF
> + git diff --cached --name-only >actual &&
> + test_cmp expect actual
> +'
> +
>  test_expect_success 'add -p does not expand argument lists' '
>   git reset --hard &&
>  
> -- 
> 2.12.2
> 

-- 
Brandon Williams


RE: How do you script linux GIT client to pass kerberos credential to apache enabled GIT server?

2017-04-03 Thread Randall S. Becker
-Original Message-
On April 3, 2017 12:04 PM, Ken Edward Wrote:
>I have my git repositories behind an apache server configured with kerberos. 
>Works fine if the user is logged in on their workstation.
>Apache gets the kerberos credential, and validates, and  then sends the GIT 
>repo being requested.
>BUT, I want to write a script on linux that will also pass the kerberos 
>credential to the apache GIT server without having any manually intervention. 
>Seems I would create a kerberos keytab for the principal and then use that to 
>>authenticate kinit supports authenticating from a keytab using the -k -t 
> options, but has anyone done this?

Have you attempted prototyping this using curl? It might be able to help out a 
bit. I have done this in the past with Stash and their REST and credentials, 
but not using Kerberos. Just a thought.
Cheers,

Randall



How do you script linux GIT client to pass kerberos credential to apache enabled GIT server?

2017-04-03 Thread ken edward
Hello,

I have my git repositories behind an apache server configured with
kerberos. Works fine if the user is logged in on their workstation.
Apache gets the kerberos credential, and validates, and  then sends
the GIT repo being requested.

BUT, I want to write a script on linux that will also pass the
kerberos credential to the apache GIT server without having any
manually intervention. Seems I would create a kerberos keytab for the
principal and then use that to authenticate kinit supports
authenticating from a keytab using the -k -t  options,
but has anyone done this?

Keith


[PATCH v2 0/2] name-hash: fix buffer overrun

2017-04-03 Thread git
From: Jeff Hostetler 

Version 2 of this fix smashes the HT and chmod issues in the
test code discussed on the mailing list.  The test has also
been updated to skip on 1 cpu systems.


Fix buffer overrun in handle_range_dir() when the final entry
in the index was the only file in the last directory, such as
"a/b/foo.txt". The look ahead (k_start + 1) was invalid since
(k_start + 1) == k_end.

Jeff Hostetler (1):
  test-online-cpus: helper to return cpu count

Kevin Willford (1):
  name-hash: fix buffer overrun

 Makefile|  1 +
 name-hash.c |  4 +++-
 t/helper/.gitignore |  1 +
 t/helper/test-online-cpus.c |  8 
 t/t3008-ls-files-lazy-init-name-hash.sh | 27 +++
 5 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-online-cpus.c
 create mode 100755 t/t3008-ls-files-lazy-init-name-hash.sh

-- 
2.9.3



[PATCH v2 2/2] name-hash: fix buffer overrun

2017-04-03 Thread git
From: Kevin Willford 

Add check for the end of the entries for the thread partition.
Add test for lazy init name hash with specific directory structure

The lazy init hash name was causing a buffer overflow when the last
entry in the index was multiple folder deep with parent folders that
did not have any files in them.

This adds a test for the boundary condition of the thread partitions
with the folder structure that was triggering the buffer overflow.
The test is skipped on single-cpu machines because the original code
path is used in name-hash.c

The fix was to check if it is the last entry for the thread partition
in the handle_range_dir and not try to use the next entry in the cache.

Signed-off-by: Kevin Willford 
Signed-off-by: Johannes Schindelin 
Signed-off-by: Jeff Hostetler 
---
 name-hash.c |  4 +++-
 t/t3008-ls-files-lazy-init-name-hash.sh | 27 +++
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100755 t/t3008-ls-files-lazy-init-name-hash.sh

diff --git a/name-hash.c b/name-hash.c
index cac313c..39309ef 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -342,7 +342,9 @@ static int handle_range_dir(
 * Scan forward in the index array for index entries having the same
 * path prefix (that are also in this directory).
 */
-   if (strncmp(istate->cache[k_start + 1]->name, prefix->buf, prefix->len) 
> 0)
+   if (k_start + 1 >= k_end)
+   k = k_end;
+   else if (strncmp(istate->cache[k_start + 1]->name, prefix->buf, 
prefix->len) > 0)
k = k_start + 1;
else if (strncmp(istate->cache[k_end - 1]->name, prefix->buf, 
prefix->len) == 0)
k = k_end;
diff --git a/t/t3008-ls-files-lazy-init-name-hash.sh 
b/t/t3008-ls-files-lazy-init-name-hash.sh
new file mode 100755
index 000..bdf5198
--- /dev/null
+++ b/t/t3008-ls-files-lazy-init-name-hash.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+test_description='Test the lazy init name hash with various folder structures'
+
+. ./test-lib.sh
+
+if test 1 -eq $($GIT_BUILD_DIR/t/helper/test-online-cpus)
+then
+   skip_all='skipping lazy-init tests, single cpu'
+   test_done
+fi
+
+LAZY_THREAD_COST=2000
+
+test_expect_success 'no buffer overflow in lazy_init_name_hash' '
+   (
+   test_seq $LAZY_THREAD_COST | sed "s/^/a_/"
+   echo b/b/b
+   test_seq $LAZY_THREAD_COST | sed "s/^/c_/"
+   test_seq 50 | sed "s/^/d_/" | tr "\n" "/"; echo d
+   ) |
+   sed "s/^/100644 $EMPTY_BLOB /" |
+   git update-index --index-info &&
+   test-lazy-init-name-hash -m
+'
+
+test_done
-- 
2.9.3



[PATCH v2 1/2] test-online-cpus: helper to return cpu count

2017-04-03 Thread git
From: Jeff Hostetler 

Created helper executable to print the value of online_cpus()
allowing multi-threaded tests to be skipped when appropriate.

Signed-off-by: Jeff Hostetler 
---
 Makefile| 1 +
 t/helper/.gitignore | 1 +
 t/helper/test-online-cpus.c | 8 
 3 files changed, 10 insertions(+)
 create mode 100644 t/helper/test-online-cpus.c

diff --git a/Makefile b/Makefile
index 9b36068..3bb31e9 100644
--- a/Makefile
+++ b/Makefile
@@ -626,6 +626,7 @@ TEST_PROGRAMS_NEED_X += test-line-buffer
 TEST_PROGRAMS_NEED_X += test-match-trees
 TEST_PROGRAMS_NEED_X += test-mergesort
 TEST_PROGRAMS_NEED_X += test-mktemp
+TEST_PROGRAMS_NEED_X += test-online-cpus
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-prio-queue
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 758ed2e..b05d67c 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -16,6 +16,7 @@
 /test-match-trees
 /test-mergesort
 /test-mktemp
+/test-online-cpus
 /test-parse-options
 /test-path-utils
 /test-prio-queue
diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
new file mode 100644
index 000..c881073
--- /dev/null
+++ b/t/helper/test-online-cpus.c
@@ -0,0 +1,8 @@
+#include "stdio.h"
+#include "thread-utils.h"
+
+int cmd_main(int argc, const char **argv)
+{
+   printf("%d\n", online_cpus());
+   return 0;
+}
-- 
2.9.3



Re: Terrible bad performance for it blame --date=iso -C

2017-04-03 Thread Jakub Narębski
W dniu 03.04.2017 o 12:56, SZEDER Gábor pisze:
> Ulrich Windl wrote:

>> In the other case (for the user bored of waiting seeking for some
>> entertainment ;-)) a "-v (verbose) option could be useful.  Or at the
>> very least: If git is expecting that some operation will take (or
>> already did take) a lot of time, give some message explaining why it
>> is taking a lot of time, and maybe how to avoid that.
> 
> It already does so by default since v2.8.0, see aba37f495 (blame: add
> support for --[no-]progress option, 2015-12-12).
> 
>   $ time git blame sha1_file.c |wc -l
>   4026
>   
>   real0m1.744s
>   user0m1.672s
>   sys 0m0.068s
>   $ time git blame -C -C sha1_file.c |wc -l
>   Blaming lines: 100% (4026/4026), done.
>   4026
>   
>   real0m3.832s
>   user0m3.716s
>   sys 0m0.112s
> 
> However, after a short peek at that commit, it only displays progress
> by default when stderr is a terminal, which might not be the case when
> invoked from emacs.

Emacs (magit?) should use `git blame --porcelain`, and do its own
progress report, just like 'git gui blame' and incremental blame mode
of gitweb.

Actually... there already is git-blamed - Minor mode for incremental
blame for Git, and mo-git-blame - An interactive, iterative 'git blame'
mode for Emacs, both available on ELPA (Emacs Lisp Package Archive).

HTH
-- 
Jakub Narębski



Re: [PATCH v3 3/4] name-rev: provide debug output

2017-04-03 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 31.03.2017 20:33:
> Junio C Hamano  writes:
> 
>> Michael J Gruber  writes:
>>
 The only case that this change may make a difference I can think of
 is when you have a tag object pointed at from outside refs/tags
 (e.g. refs/heads/foo is a tag object); if you are trying to change
 the definition of "from_tag" from the current "Is the tip inside
 refs/tags/?" to "Is the tip either inside refs/tags/ or is it a tag
 object anywhere?", that may be a good change (I didn't think things
 through, though), but that shouldn't be hidden inside a commit that
 claims to only add support for debugging.

 What problem are you solving?  
>>>
>>> Sorry, I forgot about that change and failed to mention it.
>>>
>>> It makes no difference in the non-debug case which cares about the
>>> Boolean only. In the debug case, I want to distinguish between
>>> annotated and lightweight tags, just like describe --debug does. By
>>> adding 1 via deref and passing this down, I know that an annotated tag
>>> gets the value 2, a lightweight tag 1 and everything else 0, just like
>>> describe --tags.
>>
>> So it sounds like you meant to do something else, and the
>> implementation is wrong for that something else (i.e. it wouldn't do
>> the right thing for a tag object outside refs/tags/, with or without
>> the "--debug" option passed).
> 
> The damage seems worse, but I may be misreading the code.
> 
> is_better_name() compares name->from_tag and from_tag numerically,
> because it was designed to take a boolean view of that variable.
> Now, an artificially bumped 2 gets compared with name->from_tag that
> may be 1 and gets different priority.  That artificially inflated
> value may be propagated to name->from_tag when the current tip is
> judged as a better basis for naming the object.

No, I checked not to change the existing behaviour.

If you look at the comment above that then you see that one of the sides
of the comparison is a non-tag, so we compare 0 to 1 or 2, the boolean
outcome being the same.

> If this change is only for debugging, perhaps inside if(data->debug)
> you added, instead of looking at from_tag, you can look at both
> from_tag and deref to choose which prio-nmes to show, without
> butchering the value in from_tag variable to affect the existing
> code that is exercised with or without --debug?

What I did overlook, though, was that name-rev uses the notion "under
refs/tags" for "being a tag".

In fact, it's puzzling how different describe and name-rev proceed and
weigh the alternatives. I didn't mean to change that.

In retrospect, displaying the "same" debug information for the two
commands doesn't make too much sense as long as they use different
information. name-rev does-not distinguish between tag types, so why
even display it?

I think I should change 3/4 to display exactly those bits that name-rev
actually uses for weighing different possible descriptions; they are
differents from the "describe-bits". So please withhold 3/4 and 4/4.

Michael


Re: Terrible bad performance for it blame --date=iso -C

2017-04-03 Thread SZEDER Gábor
> In the other case (for the user bored of waiting seeking for some
> entertainment ;-)) a "-v (verbose) option could be useful.  Or at the
> very least: If git is expecting that some operation will take (or
> already did take) a lot of time, give some message explaining why it
> is taking a lot of time, and maybe how to avoid that.

It already does so by default since v2.8.0, see aba37f495 (blame: add
support for --[no-]progress option, 2015-12-12).

  $ time git blame sha1_file.c |wc -l
  4026
  
  real0m1.744s
  user0m1.672s
  sys 0m0.068s
  $ time git blame -C -C sha1_file.c |wc -l
  Blaming lines: 100% (4026/4026), done.
  4026
  
  real0m3.832s
  user0m3.716s
  sys 0m0.112s

However, after a short peek at that commit, it only displays progress
by default when stderr is a terminal, which might not be the case when
invoked from emacs.



[PATCH] pathspec: always honor `PATHSPEC_PREFIX_ORIGIN` flag

2017-04-03 Thread Patrick Steinhardt
Previous to commit 5d8f084a5 (pathspec: simpler logic to prefix original
pathspec elements, 2017-01-04), we were always using the computed
`match` variable to perform pathspec matching whenever
`PATHSPEC_PREFIX_ORIGIN` is set. This is for example useful when passing
the parsed pathspecs to other commands, as the computed `match` may
contain a pathspec relative to the repository root. The commit changed
this logic to only do so when we do have an actual prefix and when
literal pathspecs are deactivated.

But this change may actually break some commands which expect passed
pathspecs to be relative to the repository root. One such case is `git
add --patch`, which now fails when using relative paths from a
subdirectory. For example if executing "git add -p ../foo.c" in a
subdirectory, the `git-add--interactive` command will directly pass
"../foo.c" to `git-ls-files`. As ls-files is executed at the
repository's root, the command will notice that "../foo.c" is outside
the repository and fail.

Fix the issue by again using the computed `match` variable whenever
`PATHSPEC_PREFIX_ORIGIN` is set. This restores behavior previous to
5d8f084a5 and fixes interactive add.

Signed-off-by: Patrick Steinhardt 
---
 pathspec.c |  6 +++---
 t/t3701-add-interactive.sh | 20 
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 303efda83..3193e45a6 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -504,12 +504,12 @@ static void init_pathspec_item(struct pathspec_item 
*item, unsigned flags,
 * Prefix the pathspec (keep all magic) and assign to
 * original. Useful for passing to another command.
 */
-   if ((flags & PATHSPEC_PREFIX_ORIGIN) &&
-   prefixlen && !get_literal_global()) {
+   if (flags & PATHSPEC_PREFIX_ORIGIN) {
struct strbuf sb = STRBUF_INIT;
 
/* Preserve the actual prefix length of each pattern */
-   prefix_magic(, prefixlen, element_magic);
+   if (prefixlen && !get_literal_global())
+   prefix_magic(, prefixlen, element_magic);
 
strbuf_addstr(, match);
item->original = strbuf_detach(, NULL);
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index f9528fa00..640088dd6 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -436,6 +436,26 @@ test_expect_success 'add -p handles globs' '
test_cmp expect actual
 '
 
+test_expect_success 'add -p handles relative paths' '
+   git reset --hard &&
+
+   echo base >root.c &&
+   git add "*.c" &&
+   git commit -m base &&
+
+   echo change >root.c &&
+   mkdir -p subdir &&
+   git -C subdir add -p "../root.c" <<-\EOF &&
+   y
+   EOF
+
+   cat >expect <<-\EOF &&
+   root.c
+   EOF
+   git diff --cached --name-only >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'add -p does not expand argument lists' '
git reset --hard &&
 
-- 
2.12.2



Antw: Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-04-03 Thread Ulrich Windl
Junio,

thanks for explaining! So if there are at least two commits, blame is fast, but 
with only one commit blame tries hard to find another commit that might have 
contributed to the one file?
I verified that without those "-C" options the result is very quick.
In the other case (for the user bored of waiting seeking for some entertainment 
;-)) a "-v (verbose) option could be useful.
Or at the very least: If git is expecting that some operation will take (or 
already did take) a lot of time, give some message explaining why it is taking 
a lot of time, and maybe how to avoid that.

Regards,
Ulrich


>>> Junio C Hamano  schrieb am 31.03.2017 um 17:52 in 
>>> Nachricht
:
> "Ulrich Windl"  writes:
> 
>> I was running "vc-annotate" in Emacs for a file from a large
>> repository (>4 files, a big percentage being binary, about 10
>> commits). For the first file the result was presented rather soon, but
>> for a second file the command did not finish even after about 10
>> minutes!
>>
>> The file in question is a rather short text file (124 kB), and
>> according to git log it has one commit.
>>
>> While being bored, I did an strace of the command to find out that a
>> huge number of files is inspected.
> 
> With -C -C the user (vc-annotate?) is asking to inspect huge number
> of files, to find if the contents of the file (except for the part
> that came from its own previous version) came from other existing
> files.  So this is very much expected.
> 
> It might not be a bad idea to teach "blame" not to pay attention to
> any path that is marked as "-diff" (e.g. binary files) when trying
> to see if remaining contents appeared by borrowing from them.  We do
> not have that heuristics (yet).