Re: XDL_FAST_HASH can be very slow
I have been working with Peff on this and have more results to share. For background, xdl_hash_record is a hashing function, producing an unsigned long from an input string terminated by either a newline or the end of the mmap'd file. The original xdl_hash_record is essentially DJB hash, which does a multiplication, load, and xor for each byte of the input. Commit 6942efc introduces an "XDL_FAST_HASH" version of the same function that is clearly inspired by the DJB hash, but it does only one multiplication, load, and xor for each 8 bytes of input -- i.e., fewer loads, but also a lot less bit mixing. Less mixing means far more collisions, leading to the performance problems with evil-icons. It's not clear to me if the XDL_FAST_HASH version intended to match the output of the DJB hash function, but it doesn't at all. Peff has been experimenting with using two modern hash functions, FNV and Murmur3. In theory, these should produce fewer collisions than DJB, but in his measurements, they didn't run diff any faster than plain DJB. They do fix the evil-icons problem. I have implemented two simpler possibilities, both of which fix the problems diffing the evil-icons repository: 1. An XDL_FAST_HASH implementation that matches the output of the DJB hash exactly. Its performance is basically the same as DJB, because the only thing is does differently is load 8 bytes at a time instead of 1. It does all the same ALU operations as DJB. 2. Using (hash % prime_number) instead of (hash & ((1
[PATCH v2] unblock and unignore SIGPIPE
Blocked and ignored signals -- but not caught signals -- are inherited across exec. Some callers with sloppy signal-handling behavior can call git with SIGPIPE blocked or ignored, even non-deterministically. When SIGPIPE is blocked or ignored, several git commands can run indefinitely, ignoring EPIPE returns from write() calls, even when the process that called them has gone away. Our specific case involved a pipe of git diff-tree output to a script that reads a limited amount of diff data. In an ideal world, git would never be called with SIGPIPE blocked or ignored. But in the real world, several real potential callers, including Perl, Apache, and Unicorn, sometimes spawn subprocesses with SIGPIPE ignored. It is easier and more productive to harden git against this mistake than to clean it up in every potential parent process. Signed-off-by: Patrick Reynolds Signed-off-by: Junio C Hamano --- 1. Merged Junio's work from pu: moved restore_sigpipe_to_default into git.c and restyled the new tests. 2. Moved the new tests into t0005. This meant switching back to `git diff` as our data generator, as the sample repo in t0005 doesn't have any files for `git ls-files` to output. 3. Squashed. git.c | 22 ++ t/t0005-signals.sh | 22 ++ 2 files changed, 44 insertions(+) diff --git a/git.c b/git.c index 210f1ae..0f03d56 100644 --- a/git.c +++ b/git.c @@ -593,6 +593,26 @@ static int run_argv(int *argcp, const char ***argv) return done_alias; } +/* + * Many parts of Git have subprograms communicate via pipe, expect the + * upstream of the pipe to die with SIGPIPE and the downstream process + * even knows to check and handle EPIPE correctly. Some third-party + * programs that ignore or block SIGPIPE for their own reason forget + * to restore SIGPIPE handling to the default before spawning Git and + * break this carefully orchestrated machinery. + * + * Restore the way SIGPIPE is handled to default, which is what we + * expect. + */ +static void restore_sigpipe_to_default(void) +{ + sigset_t unblock; + + sigemptyset(&unblock); + sigaddset(&unblock, SIGPIPE); + sigprocmask(SIG_UNBLOCK, &unblock, NULL); + signal(SIGPIPE, SIG_DFL); +} int main(int argc, char **av) { @@ -612,6 +632,8 @@ int main(int argc, char **av) */ sanitize_stdfds(); + restore_sigpipe_to_default(); + git_setup_gettext(); trace_command_performance(argv); diff --git a/t/t0005-signals.sh b/t/t0005-signals.sh index 981437b..638a355 100755 --- a/t/t0005-signals.sh +++ b/t/t0005-signals.sh @@ -27,4 +27,26 @@ test_expect_success !MINGW 'signals are propagated using shell convention' ' test_expect_code 143 git sigterm ' +large_git () { + for i in $(test_seq 1 100) + do + git diff --cached --binary || return + done +} + +test_expect_success 'create blob' ' + test-genrandom foo 16384 >file && + git add file +' + +test_expect_success 'a constipated git dies with SIGPIPE' ' + OUT=$( ((large_git; echo $? 1>&3) | :) 3>&1 ) + test "$OUT" -eq 141 +' + +test_expect_success 'a constipated git dies with SIGPIPE even if parent ignores it' ' + OUT=$( ((trap "" PIPE; large_git; echo $? 1>&3) | :) 3>&1 ) + test "$OUT" -eq 141 +' + test_done -- 2.0.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] unblock and unignore SIGPIPE
On Wed, Sep 17, 2014 at 3:11 AM, Jeff King wrote: > Would we want to call it from external C commands, too? For the most > part, git.c is the entry point for running git commands, and any > sanitizing it does will be inherited by sub-commands. But it _is_ still > legal to call dashed commands individually, and even required in some > cases (e.g., git-upload-pack for ssh clients). git-upload-pack is protected pretty well from SIGPIPE shenanigans, because its stdout all goes through write_or_die, as of cdf4fb8. We did, long ago, have some EPIPE problems with upload-pack and SSH clients, but it all predates cdf4fb8. So I think it's redundant to unblock SIGPIPE in git-upload-pack. I'll tidy up as Junio recommended, recheck the tests, and submit an updated patch shortly. --Patrick -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] unblock and unignore SIGPIPE
On Sun, Aug 17, 2014 at 8:14 PM, Eric Wong wrote: > But unicorn would ignore SIGPIPE it if Ruby did not; relying on SIGPIPE > while doing any multiplexed I/O doesn't work well. Exactly. Callers block SIGPIPE for their own legitimate reasons, but they don't consistently unblock it before spawning a git subprocess that needs the default SIGPIPE behavior. Easier to fix it in git than in every potential caller. --Patrick -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] unblock and unignore SIGPIPE
Blocked and ignored signals -- but not caught signals -- are inherited across exec. Some callers with sloppy signal-handling behavior can call git with SIGPIPE blocked or ignored, even non-deterministically. When SIGPIPE is blocked or ignored, several git commands can run indefinitely, ignoring EPIPE returns from write() calls, even when the process that called them has gone away. Our specific case involved a pipe of git diff-tree output to a script that reads a limited amount of diff data. In an ideal world, git would never be called with SIGPIPE blocked or ignored. But in the real world, several real potential callers, including Perl, Apache, and Unicorn, sometimes spawn subprocesses with SIGPIPE ignored. It is easier and more productive to harden git against this mistake than to clean it up in every potential parent process. Signed-off-by: Patrick Reynolds --- cache.h| 1 + git.c | 5 + setup.c| 11 +++ t/t0012-sigpipe.sh | 27 +++ 4 files changed, 44 insertions(+) create mode 100755 t/t0012-sigpipe.sh diff --git a/cache.h b/cache.h index fcb511d..0a89fc1 100644 --- a/cache.h +++ b/cache.h @@ -463,6 +463,7 @@ extern int set_git_dir_init(const char *git_dir, const char *real_git_dir, int); extern int init_db(const char *template_dir, unsigned int flags); extern void sanitize_stdfds(void); +extern void sanitize_signals(void); extern int daemonize(void); #define alloc_nr(x) (((x)+16)*3/2) diff --git a/git.c b/git.c index 9c49519..d6b221b 100644 --- a/git.c +++ b/git.c @@ -611,6 +611,11 @@ int main(int argc, char **av) */ sanitize_stdfds(); + /* +* Make sure we aren't ignoring or blocking SIGPIPE. +*/ + sanitize_signals(); + git_setup_gettext(); trace_command_performance(argv); diff --git a/setup.c b/setup.c index 0a22f8b..7aa4b01 100644 --- a/setup.c +++ b/setup.c @@ -865,3 +865,14 @@ int daemonize(void) return 0; #endif } + +/* un-ignore and un-block SIGPIPE */ +void sanitize_signals(void) +{ + sigset_t unblock; + + sigemptyset(&unblock); + sigaddset(&unblock, SIGPIPE); + sigprocmask(SIG_UNBLOCK, &unblock, NULL); + signal(SIGPIPE, SIG_DFL); +} diff --git a/t/t0012-sigpipe.sh b/t/t0012-sigpipe.sh new file mode 100755 index 000..213cde3 --- /dev/null +++ b/t/t0012-sigpipe.sh @@ -0,0 +1,27 @@ +#!/bin/sh + +test_description='check handling of SIGPIPE' +. ./test-lib.sh + +test_expect_success 'create blob' ' + test-genrandom foo 16384 >file && + git add file +' + +large_git () { + for i in $(test_seq 1 100); do + git diff --staged --binary || return $? + done +} + +test_expect_success 'git dies with SIGPIPE' ' + OUT=$( ((large_git; echo $? 1>&3) | true) 3>&1 ) + test "$OUT" -eq 141 +' + +test_expect_success 'git dies with SIGPIPE even if parent ignores it' ' + OUT=$( ((trap "" PIPE; large_git; echo $? 1>&3) | true) 3>&1 ) + test "$OUT" -eq 141 +' + +test_done -- 2.0.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] use a hashmap to make remotes faster
Remotes are stored as an array, so looking one up or adding one without duplication is an O(n) operation. Reading an entire config file full of remotes is O(n^2) in the number of remotes. For a repository with tens of thousands of remotes, the running time can hit multiple minutes. Hash tables are way faster. So we add a hashmap from remote name to struct remote and use it for all lookups. The time to add a new remote to a repo that already has 50,000 remotes drops from ~2 minutes to < 1 second. We retain the old array of remotes so iterators proceed in config-file order. Signed-off-by: Patrick Reynolds --- mark function with no arguments as taking void remote.c | 63 ++- remote.h | 3 +++ 2 files changed, 49 insertions(+), 17 deletions(-) diff --git a/remote.c b/remote.c index a0701f6..52533e4 100644 --- a/remote.c +++ b/remote.c @@ -42,6 +42,7 @@ struct rewrites { static struct remote **remotes; static int remotes_alloc; static int remotes_nr; +static struct hashmap remotes_hash; static struct branch **branches; static int branches_alloc; @@ -136,26 +137,51 @@ static void add_url_alias(struct remote *remote, const char *url) add_pushurl_alias(remote, url); } +struct remotes_hash_key { + const char *str; + int len; +}; + +static int remotes_hash_cmp(const struct remote *a, const struct remote *b, const struct remotes_hash_key *key) +{ + if (key) + return strncmp(a->name, key->str, key->len) || a->name[key->len]; + else + return strcmp(a->name, b->name); +} + +static inline void init_remotes_hash(void) +{ + if (!remotes_hash.cmpfn) + hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, 0); +} + static struct remote *make_remote(const char *name, int len) { - struct remote *ret; - int i; + struct remote *ret, *replaced; + struct remotes_hash_key lookup; + struct hashmap_entry lookup_entry; - for (i = 0; i < remotes_nr; i++) { - if (len ? (!strncmp(name, remotes[i]->name, len) && - !remotes[i]->name[len]) : - !strcmp(name, remotes[i]->name)) - return remotes[i]; - } + if (!len) + len = strlen(name); + + init_remotes_hash(); + lookup.str = name; + lookup.len = len; + hashmap_entry_init(&lookup_entry, memhash(name, len)); + + if ((ret = hashmap_get(&remotes_hash, &lookup_entry, &lookup)) != NULL) + return ret; ret = xcalloc(1, sizeof(struct remote)); ret->prune = -1; /* unspecified */ ALLOC_GROW(remotes, remotes_nr + 1, remotes_alloc); remotes[remotes_nr++] = ret; - if (len) - ret->name = xstrndup(name, len); - else - ret->name = xstrdup(name); + ret->name = xstrndup(name, len); + + hashmap_entry_init(ret, lookup_entry.hash); + replaced = hashmap_put(&remotes_hash, ret); + assert(replaced == NULL); /* no previous entry overwritten */ return ret; } @@ -722,13 +748,16 @@ struct remote *pushremote_get(const char *name) int remote_is_configured(const char *name) { - int i; + struct remotes_hash_key lookup; + struct hashmap_entry lookup_entry; read_config(); - for (i = 0; i < remotes_nr; i++) - if (!strcmp(name, remotes[i]->name)) - return 1; - return 0; + init_remotes_hash(); + lookup.str = name; + lookup.len = strlen(name); + hashmap_entry_init(&lookup_entry, memhash(name, lookup.len)); + + return hashmap_get(&remotes_hash, &lookup_entry, &lookup) != NULL; } int for_each_remote(each_remote_fn fn, void *priv) diff --git a/remote.h b/remote.h index 917d383..81cb5ff 100644 --- a/remote.h +++ b/remote.h @@ -1,6 +1,7 @@ #ifndef REMOTE_H #define REMOTE_H +#include "hashmap.h" #include "parse-options.h" enum { @@ -10,6 +11,8 @@ enum { }; struct remote { + struct hashmap_entry ent; /* must be first */ + const char *name; int origin; -- 2.0.0.rc4