Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-26 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:

> Hi Buga,
>
> On Tue, 13 Mar 2018, Igor Djordjevic wrote:
>
>> On 12/03/2018 11:46, Johannes Schindelin wrote:
>> > 
>> > > Sometimes one just needs to read the manual, and I don`t really
>> > > think this is a ton complicated, but just something we didn`t really
>> > > have before (real merge rebasing), so it requires a moment to grasp
>> > > the concept.
>> > 
>> > If that were the case, we would not keep getting bug reports about
>> > --preserve-merges failing to reorder patches.
>> 
>> Not sure where that is heading to, but what I`m arguing about is that 
>> introducing new commands and concepts (`merge`, and with `-R`) just 
>> makes the situation even worse (more stuff to grasp).
>
> The problem with re-using `pick` is that its concept does not apply to
> merges. The cherry-pick of a non-merge commit is well-defined: the current
> HEAD is implicitly chosen as the cherry-picked commit's (single) parent
> commit. There is no ambiguity here.
>
> But for merge commits, we need to specify the parent commits (apart from
> the first one) *explicitly*. There was no need for that in the `pick`
> command, nor in the concept of a cherry-pick.
>
>> Reusing existing concepts where possible doesn`t have this problem.
>
> Existing concepts are great. As long as they fit the requirements of the
> new scenarios. In this case, `pick` does *not* fit the requirement of
> "rebase a merge commit".

It does, provided you use suitable syntax.

> If you really want to force the `pick` concept onto the use case where
> you need to "reapply" merges, then the closest you get really is
> Sergey's idea, which I came to reject when considering its practical
> implications.

Which one, and what are the implications that are bad, I wonder?

> Even so, you would have to make the `pick` command more complicated to
> support merge commits. And whatever you would do to extend the `pick`
> command would *not make any sense* to the current use case of the `pick`
> command.

It would rather make a lot of sense. Please don't use 'merge' to pick
commits, merge ones or not!

> The real problem, of course, is that a non-merge commit, when viewed from
> the perspective of the changes it introduced, is a very different beast
> than a merge commit: it does not need to reconcile changes, ever, because
> there is really only one "patch" to one revision. That is very different
> from a merge commit, whose changes can even disagree with one another (and
> in fact be resolved with changes disagreeing *yet again*)!

You'd still 'pick' it though, not 'merge'. You don't merge "merge
commit", it makes no sense. It only makes perfect sense when you get rid
of original "merge commit" and re-merge from scratch, as you were doing
till now.

>> > > Saying in favor of `--rebase-merges`, you mean as a separate option,
>> > > alongside `--recreate-merges` (once that series lands)?
>> > 
>> > No. I am against yet another option. The only reason I pollute the
>> > option name space further with --recreate-merges is that it would be
>> > confusing to users if the new mode was called --preserve-merges=v2
>> > (but work *totally differently*).
>> 
>> I see. So I take you`re thinking about renaming `--recreate-merges` to
>> `--rebase-merges` instead?
>
> Thinking about it. Nothing will happen before v2.17.0 on that front,
> though, because -- unlike you gentle people -- I have to focus on
> stabilizing Git's code base now.
>
>> That would seem sensible, too, I think, being the default usage mode in
>> the first place. Being able to actually (re)create merges, too, once
>> user goes interactive, would be "just" an additional (nice and powerful)
>> feature on top of it.
>
> The implementation detail is, of course, that I will introduce this with
> the technically-simpler strategy: always recreating merge commits with the
> recursive strategy. A follow-up patch series will add support for rebasing
> merge commits, and then use it by default.

Switching to use it by default would be backward incompatible again? Yet
another option to obsolete? Sigh. 

-- Sergey


[PATCH v3 5/5] stash: convert pop to builtin

2018-03-26 Thread Joel Teichroeb
Add stash pop to the helper and delete the pop_stash, drop_stash,
assert_stash_ref and pop_stash functions from the shell script
now that they are no longer needed.

Signed-off-by: Joel Teichroeb 
---
 builtin/stash--helper.c | 41 
 git-stash.sh| 50 -
 2 files changed, 45 insertions(+), 46 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index d755faf33..53c0d2171 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -12,6 +12,7 @@
 
 static const char * const git_stash_helper_usage[] = {
N_("git stash--helper drop [-q|--quiet] []"),
+   N_("git stash--helper pop [--index] [-q|--quiet] []"),
N_("git stash--helper apply [--index] [-q|--quiet] []"),
N_("git stash--helper branch  []"),
N_("git stash--helper clear"),
@@ -23,6 +24,11 @@ static const char * const git_stash_helper_drop_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_pop_usage[] = {
+   N_("git stash--helper pop [--index] [-q|--quiet] []"),
+   NULL
+};
+
 static const char * const git_stash_helper_apply_usage[] = {
N_("git stash--helper apply [--index] [-q|--quiet] []"),
NULL
@@ -513,6 +519,39 @@ static int drop_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int pop_stash(int argc, const char **argv, const char *prefix)
+{
+   int index = 0, ret;
+   struct stash_info info;
+   struct option options[] = {
+   OPT__QUIET(, N_("be quiet, only report errors")),
+   OPT_BOOL(0, "index", ,
+   N_("attempt to recreate the index")),
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+   git_stash_helper_pop_usage, 0);
+
+   if (get_stash_info(, argc, argv))
+   return -1;
+
+   if (assert_stash_ref()) {
+   free_stash_info();
+   return -1;
+   }
+
+   if (do_apply_stash(prefix, , index)) {
+   printf_ln(_("The stash entry is kept in case you need it 
again."));
+   free_stash_info();
+   return -1;
+   }
+
+   ret = do_drop_stash(prefix, );
+   free_stash_info();
+   return ret;
+}
+
 static int branch_stash(int argc, const char **argv, const char *prefix)
 {
const char *branch = NULL;
@@ -578,6 +617,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
result = clear_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "drop"))
result = drop_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "pop"))
+   result = pop_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "branch"))
result = branch_stash(argc, argv, prefix);
else {
diff --git a/git-stash.sh b/git-stash.sh
index c5fd4c6c4..8f2640fe9 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -554,50 +554,6 @@ assert_stash_like() {
}
 }
 
-is_stash_ref() {
-   is_stash_like "$@" && test -n "$IS_STASH_REF"
-}
-
-assert_stash_ref() {
-   is_stash_ref "$@" || {
-   args="$*"
-   die "$(eval_gettext "'\$args' is not a stash reference")"
-   }
-}
-
-apply_stash () {
-   cd "$START_DIR"
-   git stash--helper apply "$@"
-   res=$?
-   cd_to_toplevel
-   return $res
-}
-
-pop_stash() {
-   assert_stash_ref "$@"
-
-   if apply_stash "$@"
-   then
-   drop_stash "$@"
-   else
-   status=$?
-   say "$(gettext "The stash entry is kept in case you need it 
again.")"
-   exit $status
-   fi
-}
-
-drop_stash () {
-   assert_stash_ref "$@"
-
-   git reflog delete --updateref --rewrite "${REV}" &&
-   say "$(eval_gettext "Dropped \${REV} (\$s)")" ||
-   die "$(eval_gettext "\${REV}: Could not drop stash entry")"
-
-   # clear_stash if we just dropped the last stash entry
-   git rev-parse --verify --quiet "$ref_stash@{0}" >/dev/null ||
-   clear_stash
-}
-
 test "$1" = "-p" && set "push" "$@"
 
 PARSE_CACHE='--not-parsed'
@@ -634,7 +590,8 @@ push)
;;
 apply)
shift
-   apply_stash "$@"
+   cd "$START_DIR"
+   git stash--helper apply "$@"
;;
 clear)
shift
@@ -654,7 +611,8 @@ drop)
;;
 pop)
shift
-   pop_stash "$@"
+   cd "$START_DIR"
+   git stash--helper pop "$@"
;;
 branch)
shift
-- 
2.16.2



[PATCH v3 4/5] stash: convert branch to builtin

2018-03-26 Thread Joel Teichroeb
Add stash branch to the helper and delete the apply_to_branch
function from the shell script.

Signed-off-by: Joel Teichroeb 
---
 builtin/stash--helper.c | 47 +++
 git-stash.sh| 17 ++---
 2 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index 0d7a0d55e..d755faf33 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -13,6 +13,7 @@
 static const char * const git_stash_helper_usage[] = {
N_("git stash--helper drop [-q|--quiet] []"),
N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   N_("git stash--helper branch  []"),
N_("git stash--helper clear"),
NULL
 };
@@ -27,6 +28,11 @@ static const char * const git_stash_helper_apply_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_branch_usage[] = {
+   N_("git stash--helper branch  []"),
+   NULL
+};
+
 static const char * const git_stash_helper_clear_usage[] = {
N_("git stash--helper clear"),
NULL
@@ -507,6 +513,45 @@ static int drop_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int branch_stash(int argc, const char **argv, const char *prefix)
+{
+   const char *branch = NULL;
+   int ret;
+   struct argv_array args = ARGV_ARRAY_INIT;
+   struct stash_info info;
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+   git_stash_helper_branch_usage, 0);
+
+   if (argc == 0)
+   return error(_("No branch name specified"));
+
+   branch = argv[0];
+
+   if (get_stash_info(, argc - 1, argv + 1))
+   return -1;
+
+   argv_array_pushl(, "checkout", "-b", NULL);
+   argv_array_push(, branch);
+   argv_array_push(, oid_to_hex(_commit));
+   ret = cmd_checkout(args.argc, args.argv, prefix);
+   if (ret) {
+   free_stash_info();
+   return -1;
+   }
+
+   ret = do_apply_stash(prefix, , 1);
+   if (!ret && info.is_stash_ref)
+   ret = do_drop_stash(prefix, );
+
+   free_stash_info();
+
+   return ret;
+}
+
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
 {
int result = 0;
@@ -533,6 +578,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
result = clear_stash(argc, argv, prefix);
else if (!strcmp(argv[0], "drop"))
result = drop_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "branch"))
+   result = branch_stash(argc, argv, prefix);
else {
error(_("unknown subcommand: %s"), argv[0]);
usage_with_options(git_stash_helper_usage, options);
diff --git a/git-stash.sh b/git-stash.sh
index 0b8f07b38..c5fd4c6c4 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -598,20 +598,6 @@ drop_stash () {
clear_stash
 }
 
-apply_to_branch () {
-   test -n "$1" || die "$(gettext "No branch name specified")"
-   branch=$1
-   shift 1
-
-   set -- --index "$@"
-   assert_stash_like "$@"
-
-   git checkout -b $branch $REV^ &&
-   apply_stash "$@" && {
-   test -z "$IS_STASH_REF" || drop_stash "$@"
-   }
-}
-
 test "$1" = "-p" && set "push" "$@"
 
 PARSE_CACHE='--not-parsed'
@@ -672,7 +658,8 @@ pop)
;;
 branch)
shift
-   apply_to_branch "$@"
+   cd "$START_DIR"
+   git stash--helper branch "$@"
;;
 *)
case $# in
-- 
2.16.2



[PATCH v3 3/5] stash: convert drop and clear to builtin

2018-03-26 Thread Joel Teichroeb
Add the drop and clear commands to the builtin helper. These two
are each simple, but are being added together as they are quite
related.

We have to unfortunately keep the drop and clear functions in the
shell script as functions are called with parameters internally
that are not valid when the commands are called externally. Once
pop is converted they can both be removed.

Signed-off-by: Joel Teichroeb 
---
 builtin/stash--helper.c | 101 
 git-stash.sh|   4 +-
 2 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
index df3c8e1e6..0d7a0d55e 100644
--- a/builtin/stash--helper.c
+++ b/builtin/stash--helper.c
@@ -11,7 +11,14 @@
 #include "dir.h"
 
 static const char * const git_stash_helper_usage[] = {
+   N_("git stash--helper drop [-q|--quiet] []"),
N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   N_("git stash--helper clear"),
+   NULL
+};
+
+static const char * const git_stash_helper_drop_usage[] = {
+   N_("git stash--helper drop [-q|--quiet] []"),
NULL
 };
 
@@ -20,6 +27,11 @@ static const char * const git_stash_helper_apply_usage[] = {
NULL
 };
 
+static const char * const git_stash_helper_clear_usage[] = {
+   N_("git stash--helper clear"),
+   NULL
+};
+
 static const char *ref_stash = "refs/stash";
 static int quiet;
 static char stash_index_path[PATH_MAX];
@@ -166,6 +178,29 @@ static int get_stash_info(struct stash_info *info, int 
argc, const char **argv)
return 0;
 }
 
+static int do_clear_stash(void)
+{
+   struct object_id obj;
+   if (get_oid(ref_stash, ))
+   return 0;
+
+   return delete_ref(NULL, ref_stash, , 0);
+}
+
+static int clear_stash(int argc, const char **argv, const char *prefix)
+{
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options, 
git_stash_helper_clear_usage, PARSE_OPT_STOP_AT_NON_OPTION);
+
+   if (argc != 0)
+   return error(_("git stash--helper clear with parameters is 
unimplemented"));
+
+   return do_clear_stash();
+}
+
 static int reset_tree(struct object_id *i_tree, int update, int reset)
 {
struct unpack_trees_options opts;
@@ -410,6 +445,68 @@ static int apply_stash(int argc, const char **argv, const 
char *prefix)
return ret;
 }
 
+static int do_drop_stash(const char *prefix, struct stash_info *info)
+{
+   struct argv_array args = ARGV_ARRAY_INIT;
+   int ret;
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   argv_array_pushl(, "reflog", "delete", "--updateref", "--rewrite", 
NULL);
+   argv_array_push(, info->revision.buf);
+   ret = cmd_reflog(args.argc, args.argv, prefix);
+   if (!ret) {
+   if (!quiet)
+   printf(_("Dropped %s (%s)\n"), info->revision.buf, 
oid_to_hex(>w_commit));
+   } else {
+   return error(_("%s: Could not drop stash entry"), 
info->revision.buf);
+   }
+
+   cp.git_cmd = 1;
+   /* Even though --quiet is specified, rev-parse still outputs the hash */
+   cp.no_stdout = 1;
+   argv_array_pushl(, "rev-parse", "--verify", "--quiet", NULL);
+   argv_array_pushf(, "%s@{0}", ref_stash);
+   ret = run_command();
+
+   if (ret)
+   do_clear_stash();
+
+   return 0;
+}
+
+static int assert_stash_ref(struct stash_info *info)
+{
+   if (!info->is_stash_ref)
+   return error(_("'%s' is not a stash reference"), 
info->revision.buf);
+
+   return 0;
+}
+
+static int drop_stash(int argc, const char **argv, const char *prefix)
+{
+   struct stash_info info;
+   int ret;
+   struct option options[] = {
+   OPT__QUIET(, N_("be quiet, only report errors")),
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options,
+   git_stash_helper_drop_usage, 0);
+
+   if (get_stash_info(, argc, argv))
+   return -1;
+
+   if (assert_stash_ref()) {
+   free_stash_info();
+   return -1;
+   }
+
+   ret = do_drop_stash(prefix, );
+   free_stash_info();
+   return ret;
+}
+
 int cmd_stash__helper(int argc, const char **argv, const char *prefix)
 {
int result = 0;
@@ -432,6 +529,10 @@ int cmd_stash__helper(int argc, const char **argv, const 
char *prefix)
usage_with_options(git_stash_helper_usage, options);
else if (!strcmp(argv[0], "apply"))
result = apply_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "clear"))
+   result = clear_stash(argc, argv, prefix);
+   else if (!strcmp(argv[0], "drop"))
+   result = drop_stash(argc, argv, prefix);
else {
error(_("unknown subcommand: %s"), argv[0]);

[PATCH v3 2/5] stash: convert apply to builtin

2018-03-26 Thread Joel Teichroeb
Add a bulitin helper for performing stash commands. Converting
all at once proved hard to review, so starting with just apply
let conversion get started without the other command being
finished.

The helper is being implemented as a drop in replacement for
stash so that when it is complete it can simply be renamed and
the shell script deleted.

Delete the contents of the apply_stash shell function and replace
it with a call to stash--helper apply until pop is also
converted.

Signed-off-by: Joel Teichroeb 
---
 .gitignore  |   1 +
 Makefile|   1 +
 builtin.h   |   1 +
 builtin/stash--helper.c | 442 
 git-stash.sh|  75 +---
 git.c   |   1 +
 6 files changed, 451 insertions(+), 70 deletions(-)
 create mode 100644 builtin/stash--helper.c

diff --git a/.gitignore b/.gitignore
index 833ef3b0b..296d5f376 100644
--- a/.gitignore
+++ b/.gitignore
@@ -152,6 +152,7 @@
 /git-show-ref
 /git-stage
 /git-stash
+/git-stash--helper
 /git-status
 /git-stripspace
 /git-submodule
diff --git a/Makefile b/Makefile
index 96f6138f6..6cfdbe9a3 100644
--- a/Makefile
+++ b/Makefile
@@ -1028,6 +1028,7 @@ BUILTIN_OBJS += builtin/send-pack.o
 BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-ref.o
+BUILTIN_OBJS += builtin/stash--helper.o
 BUILTIN_OBJS += builtin/stripspace.o
 BUILTIN_OBJS += builtin/submodule--helper.o
 BUILTIN_OBJS += builtin/symbolic-ref.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa..a14fd85b0 100644
--- a/builtin.h
+++ b/builtin.h
@@ -219,6 +219,7 @@ extern int cmd_shortlog(int argc, const char **argv, const 
char *prefix);
 extern int cmd_show(int argc, const char **argv, const char *prefix);
 extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
 extern int cmd_status(int argc, const char **argv, const char *prefix);
+extern int cmd_stash__helper(int argc, const char **argv, const char *prefix);
 extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
 extern int cmd_submodule__helper(int argc, const char **argv, const char 
*prefix);
 extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
new file mode 100644
index 0..df3c8e1e6
--- /dev/null
+++ b/builtin/stash--helper.c
@@ -0,0 +1,442 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+#include "refs.h"
+#include "lockfile.h"
+#include "cache-tree.h"
+#include "unpack-trees.h"
+#include "merge-recursive.h"
+#include "argv-array.h"
+#include "run-command.h"
+#include "dir.h"
+
+static const char * const git_stash_helper_usage[] = {
+   N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   NULL
+};
+
+static const char * const git_stash_helper_apply_usage[] = {
+   N_("git stash--helper apply [--index] [-q|--quiet] []"),
+   NULL
+};
+
+static const char *ref_stash = "refs/stash";
+static int quiet;
+static char stash_index_path[PATH_MAX];
+
+struct stash_info {
+   struct object_id w_commit;
+   struct object_id b_commit;
+   struct object_id i_commit;
+   struct object_id u_commit;
+   struct object_id w_tree;
+   struct object_id b_tree;
+   struct object_id i_tree;
+   struct object_id u_tree;
+   struct strbuf revision;
+   int is_stash_ref;
+   int has_u;
+};
+
+static int get_symbolic_name(const char *symbolic, struct strbuf *out)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   cp.git_cmd = 1;
+   argv_array_pushl(, "rev-parse", "--symbolic-full-name", NULL);
+   argv_array_push(, symbolic);
+   return pipe_command(, NULL, 0, out, 0, NULL, 0);
+}
+
+static int have_stash(void)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   cp.git_cmd = 1;
+   cp.no_stdout = 1;
+   argv_array_pushl(, "rev-parse", "--verify", "--quiet", NULL);
+   argv_array_push(, ref_stash);
+   return pipe_command(, NULL, 0, NULL, 0, NULL, 0);
+}
+
+static void free_stash_info(struct stash_info *info)
+{
+   strbuf_release(>revision);
+}
+
+static int get_stash_info(struct stash_info *info, int argc, const char **argv)
+{
+   struct strbuf w_commit_rev = STRBUF_INIT;
+   struct strbuf b_commit_rev = STRBUF_INIT;
+   struct strbuf w_tree_rev = STRBUF_INIT;
+   struct strbuf b_tree_rev = STRBUF_INIT;
+   struct strbuf i_tree_rev = STRBUF_INIT;
+   struct strbuf u_tree_rev = STRBUF_INIT;
+   struct strbuf symbolic = STRBUF_INIT;
+   struct strbuf out = STRBUF_INIT;
+   int ret;
+   const char *revision;
+   const char *commit = NULL;
+   char *end_of_rev;
+   info->is_stash_ref = 0;
+
+   if (argc > 1) {
+   int i;
+   struct strbuf refs_msg = STRBUF_INIT;
+   for (i = 0; i < argc; ++i)
+   

[PATCH v3 0/5] Convert some stash functionality to a builtin

2018-03-26 Thread Joel Teichroeb
I've been working on converting all of git stash to be a
builtin, however it's hard to get it all working at once with
limited time, so I've moved around half of it to a new
stash--helper builtin and called these functions from the shell
script. Once this is stabalized, it should be easier to convert
the rest of the commands one at a time without breaking
anything.

I've sent most of this code before, but that was targetting a
full replacement of stash. The code is overall the same, but
with some code review changes and updates for internal api
changes.

Since there seems to be interest from GSOC students who want to
work on converting builtins, I figured I should finish what I
have that works now so they could build on top of it.

Previous threads:
v1: https://public-inbox.org/git/20180325173916.GE10909@hank/T/
v2: https://public-inbox.org/git/20180326011426.19159-1-j...@teichroeb.net/

Changes from v2:
 - Fixed formatting (I keep forgetting to set vim to tabs)
 - Renamed destroy to free
 - Redid my tests to validate more (Thanks Johannes)
 - Deleted more shell code that isn't needed anymore

Joel Teichroeb (5):
  stash: improve option parsing test coverage
  stash: convert apply to builtin
  stash: convert drop and clear to builtin
  stash: convert branch to builtin
  stash: convert pop to builtin

 .gitignore  |   1 +
 Makefile|   1 +
 builtin.h   |   1 +
 builtin/stash--helper.c | 631 
 git-stash.sh| 136 +--
 git.c   |   1 +
 t/t3903-stash.sh|  16 ++
 7 files changed, 659 insertions(+), 128 deletions(-)
 create mode 100644 builtin/stash--helper.c

-- 
2.16.2



[PATCH v3 1/5] stash: improve option parsing test coverage

2018-03-26 Thread Joel Teichroeb
In preparation for converting the stash command incrementally to
a builtin command, this patch improves test coverage of the option
parsing. Both for having too many paramerters, or too few.

Signed-off-by: Joel Teichroeb 
---
 t/t3903-stash.sh | 16 
 1 file changed, 16 insertions(+)

diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index aefde7b17..8a666c60c 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -84,6 +84,17 @@ test_expect_success 'apply stashed changes (including 
index)' '
test 1 = $(git show HEAD:file)
 '
 
+test_expect_success 'giving too many ref agruments does nothing' '
+
+   for type in apply drop pop show "branch stash-branch"
+   do
+   test-chmtime =123456789 file &&
+   test_must_fail git stash $type stash@{0} stash@{1} 2>err &&
+   test_i18ngrep "Too many" err &&
+   test 123456789 = $(test-chmtime -v +0 file | sed 
's/[^0-9].*$//') || return 1
+   done
+'
+
 test_expect_success 'unstashing in a subdirectory' '
git reset --hard HEAD &&
mkdir subdir &&
@@ -479,6 +490,11 @@ test_expect_success 'stash branch - stashes on stack, 
stash-like argument' '
test $(git ls-files --modified | wc -l) -eq 1
 '
 
+test_expect_success 'stash branch complains with no arguments' '
+   test_must_fail git stash branch 2>err &&
+   test_i18ngrep "No branch name specified" err
+'
+
 test_expect_success 'stash show format defaults to --stat' '
git stash clear &&
test_when_finished "git reset --hard HEAD" &&
-- 
2.16.2



Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-26 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:

> Hi Sergey,
>
> On Mon, 12 Mar 2018, Sergey Organov wrote:
>
>> [...]
>> 
>> Yet another consequence is that my approach will likely result in better
>> code reuse.
>
> This is a purely academic speculation. At least until somebody implements
> Phillip's method. Oh wait, I already started to implement it, and it was
> not exactly hard to implement:
>
> https://github.com/dscho/git/commit/26d2858800a4e0d3cc6313ddb54dd4d2ce516f31

Nice! Please see [1] for some recent relevant discussion.

[1] https://public-inbox.org/git/87efkn6s1h@javad.com/

-- Sergey


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-26 Thread Sergey Organov
Dear Johannes,

Johannes Schindelin  writes:

> Hi Sergey,
>
> On Mon, 12 Mar 2018, Sergey Organov wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > [...]
>> >
>> > Where "easy" meant that I had to spend 1h still to figure out why
>> > using the unrebased merge parents as merge bases.
>> 
>> That's because you try to figure out something that is not there in the
>> [RFC v2]. I suggest to forget everything you've already imagined and
>> just read the [RFC v2] proposal afresh. It should take about 10 minutes
>> or less to get it. Really.
>> 
>> > The same amount of time did not allow me to wrap my head around
>> > Sergey's verbose explanations.
>> 
>> Honestly, I don't believe it, sorry, but I'm willing to explain anything
>> you wish to be explained in _[RFC v2]_.
>
> No, really. If you cannot bring yourself to believe my words, then I hate
> to break it to you: I am not lying.
>
> As to "I'm willing to explain anything you wish to be explained in RFC
> v2": I was asking, and asking, and asking again, for a simple summary of
> the idea behind your proposal. Nothing. That was the answer.

No. The answer rather was this simple explanation that I gave you
multiple times already "rebase each side of the merge, then merge the
results back using original merge commit as the merge base". Yet you say
there was none. I'm confused.

Well, as it seems you grok Phillip's notation just fine, here is RFC
algorithm in this notation [1]:

git checkout --detach A'
git merge-recursive A -- A' M
tree_U1'=$(git write-tree)
git checkout --detach B'
git merge-recursive B -- B' M
tree_U2'=$(git write-tree)
git merge-recursive M -- $tree_U1' $tree_U2'
tree=$(git write-tree)
M'=$(git log --pretty=%B -1 M | git commit-tree -pA' -pB')

> I had to figure it out myself: the idea is to *create* fake commits,
> non-merge ones, for every single merge commit parent. Those fake commits
> combine the changes of *all* merge commit parents *but one*. And then
> those commits are rebased, individually, with tons of opportunities for
> merge conflicts. Repeated ones. And then that result is merged.

Wrong. See above.

Anyway, it doesn't matter anymore, see [1].

References:

[1] https://public-inbox.org/git/87efkn6s1h@javad.com

-- Sergey


Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-26 Thread Sergey Organov
Johannes Schindelin  writes:

> Hi Sergey,
>
> On Mon, 12 Mar 2018, Sergey Organov wrote:
>
>> Johannes Schindelin  writes:
>> >
>> > On Wed, 7 Mar 2018, Sergey Organov wrote:
>> >
>> >> Johannes Schindelin  writes:
>> >> 
>> >> > How can your approach -- which relies *very much* on having the
>> >> > original parent commits -- not *require* that consistency check?
>> >> 
>> >> I don't understand what you mean, sorry. Could you please point me to
>> >> the *require* you talk about in the original proposal?
>> >
>> > Imagine a todo list that contains this line
>> >
>> >merge -C abcdef 123456
>> >
>> > and now the user edits it (this is an interactive rebase, after all),
>> > adding another merge head:
>> >
>> >merge -C abcdef 987654 123456
>> >
>> > Now your strategy would have a serious problem: to find the original
>> > version of 987654. If there was one.
>> 
>> We are talking about different checks then. My method has a built-in
>> check that Pillip's one doesn't.
>
> Since you did not bother to elaborate, I have to assume that your
> "built-in check" is that thing where intermediate merges can give you
> conflicts?
>
> If so, there is a possibility in Phillip's method for such conflicts, too:
> we have to perform as many 3-way merges as there are parent commits.
>
> It does make me uncomfortable to have to speculate what you meant,
> though.

It doesn't matter anymore as this check could easily be added to
Phillip's algorithm as well, see [1].

[1] https://public-inbox.org/git/87efkn6s1h@javad.com


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-26 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:
> Hi Buga,
>
> On Tue, 13 Mar 2018, Igor Djordjevic wrote:
>
>> On 12/03/2018 13:56, Sergey Organov wrote:
>> > 
>> > > > I agree with both of you that `pick ` is inflexible
>> > > > (not to say just plain wrong), but I never thought about it like
>> > > > that.
>> > > >
>> > > > If we are to extract further mentioned explicit old:new merge
>> > > > parameter mapping to a separate discussion point, what we`re
>> > > > eventually left with is just replacing this:
>> > > >
>> > > >merge -R -C  
>> > > >
>> > > > ... with this:
>> > > >
>> > > >pick  
>> > >
>> > > I see where you are coming from.
>> > >
>> > > I also see where users will be coming from. Reading a todo list in
>> > > the editor is as much documentation as it is a "program to execute".
>> > > And I am afraid that reading a command without even mentioning the
>> > > term "merge" once is pretty misleading in this setting.
>> > >
>> > > And even from the theoretical point of view: cherry-picking
>> > > non-merge commits is *so much different* from "rebasing merge
>> > > commits" as discussed here, so much so that using the same command
>> > > would be even more misleading.
>> > 
>> > This last statement is plain wrong when applied to the method in the
>> > [RFC] you are replying to.
>
> That is only because the RFC seems to go out of its way to break down a
> single merge commit into as many commits as there are merge commit
> parents.

Complex entity is being split for ease of reasoning. People tend to use
this often.

> This is a pretty convoluted way to think about it: if you have three
> parent commits, for example, that way of thinking would introduce three
> intermediate commits, one with the changes of parent 2 & 3 combined, one
> with the changes of parent 1 & 3 combined, and one with the changes of
> parent 1 & 2 combined.

No.

> To rebase those commits, you essentially have to rebase *every parent's
> changes twice*.

No.

> It gets worse with merge commits that have 4 parents. In that case, you
> have to rebase every parent's changes *three times*.

Sorry, the [RFC] has nothing of the above. Once again, it's still just
as simple is: rebase every side of the merge then merge the results
using the original merge commit as a merge base.

And if you can't or don't want to grok the explanation in the RFC, just
forget the explanation, no problem.

> And so on.
>
>> > Using the method in [RFC], "cherry-pick non-merge" is nothing more or
>> > less than reduced version of generic "cherry-pick merge", exactly as
>> > it should be.
>
> I really get the impression that you reject Phillip's proposal on the
> ground of not being yours. In other words, the purpose of this here
> argument is to praise one proposal because of its heritage, rather than
> trying to come up with the best solution.

No. As the discussion evolved, I inclined to conclusion that modified
Phillip's algorithm is actually better suited for the implementation
[1].

> On that basis, I will go with the proposal that is clearly the simplest
> and does the job and gets away with avoiding unnecessary work.

These algorithms are actually the same one, as has already been shown
elsewhere in the discussion. Asymmetric incremental nature of the
Phillip's one is apparently better suited for naturally asymmetrical way
Git already handles merging. FYI, here is the latest proposal that came
out of discussion [1]:

git-rebase-first-parent --onto A' M
tree_U1'=$(git write-tree)
git merge-recursive B -- $tree_U1' B'
tree=$(git write-tree)
M'=$(git log --pretty=%B -1 M | git commit-tree -pA' -pB')
[ $conflicted_last_merge = "yes" ] ||
  trees-match $tree_U1' $tree || 
  stop-for-user-amendment

where 'git-rebase-first-parent' denotes whatever machinery is currently
being used to rebase simple non-merge commit.

>
>> > Or, in other words, "cherry-pick merge" is generalization of
>> > "cherry-pick non-merge" to multiple parents.
>> 
>> I think Sergey does have a point here, his approach showing it.
>
> His approach is showing that he wants to shoehorn the "rebase a merge
> commit" idea into a form where you can cherry-pick *something*.
>
> It does not have to make sense. And to me, it really does not.

Except that Phillip's one does exactly this as well, only in incremental
manner, as shown in [1].

>
>> Phillip`s simplification might be further from it, though, but we`re 
>> talking implementation again - important mental model should just be 
>> "rebasing a commit" (merge or non-merge), how we`re doing it is 
>> irrelevant for the user, the point (goal) is the same.
>
> Except that Phillip's simplification is not a simplification. It comes
> from a different point of view: trying to reconcile the diverging
> changes.

They are essentially the same as one easily converts to another and back
[1]. They will only bring different user experience in case of
conflicts.

> Phillip's is a true generalization of the 

Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Junio C Hamano
Jeff Hostetler  writes:

> I did the uint64_t for the unsigned ns times.
>
> I did the other one for the usual signed ints.
>
> I could convert them both to a single signed 64 bit typed function
> if we only want to have one function.

I still think having sized version is a horrible idea, and recommend
instrad to use "the widest possible on the platform" type, for the
same reason why you would only have variant for double but not for
float.

intmax and uintmax are by definition wide enough to hold any value
that would fit in any integral type the platform supports, so if a
caller that wants to handle 64-bit unsigned timestamp for example
uses uint64_t variable to pass such a timestamp around, and the
platform is capable of groking that code, you should be able to
safely pass that to json serializer you are writing that takes
uintmax_t just fine, and (1) your caller that passes around uint64_t
timestamps won't compile on a platform that is incapable of doing
64-bit and you have bigger problem than uintmax_t being narrower
than 64-bit on such a platform, and (2) your caller can just pass
uint64_t value to your JSON formatter that expects uintmax_t without
explicit casting, as normal integral type promotion rule would
apply.

So I would think it is most sensible to have double, uintmax_t and
intmax_t variants.  If you do not care about the extra value range
that unsigned integral types afford, a single intmax_t variant would
also be fine.



Re: [RFC PATCH 1/1] json-writer: incorrect format specifier

2018-03-26 Thread Ramsay Jones


On 26/03/18 18:04, Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
 @@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const 
 char *key, uint64_t value)
maybe_add_comma(jw);
  
append_quoted_string(>json, key);
 -  strbuf_addf(>json, ":%"PRIuMAX, value);
 +  strbuf_addf(>json, ":%"PRIu64, value);
>>>
>>> In this code-base, that would normally be written as:
>>>
>>> strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t) value);
>>
>> heh, I should learn not to reply in a hurry, just before
>> going out ...
>>
>> I had not noticed that 'value' was declared with an 'sized type'
>> of uint64_t, so using PRIu64 should be fine.
> 
> But why is this codepath using a sized type in the first place?  It
> is not like it wants to read/write a fixed binary file format---it
> just wants to use an integer type that is wide enough to handle any
> inttype the platform uses, for which uintmax_t would be a more
> appropriate type, no?

I must confess to not having given any thought to the wider
implications of the code. I don't really know what this code
is going to be used for. [Although I did shudder when I read
some mention of a 'universal interchange format' - I still
have nightmares about XML :-D ]

ATB,
Ramsay Jones




Re: [PATCH v4] json_writer: new routines to create data in JSON format

2018-03-26 Thread Ramsay Jones


On 26/03/18 15:31, g...@jeffhostetler.com wrote:
> From: Jeff Hostetler 
> 
> Add a series of jw_ routines and "struct json_writer" structure to compose
> JSON data.  The resulting string data can then be output by commands wanting
> to support a JSON output format.
> 
> The json-writer routines can be used to generate structured data in a
> JSON-like format.  We say "JSON-like" because we do not enforce the Unicode
> (usually UTF-8) requirement on string fields.  Internally, Git does not
> necessarily have Unicode/UTF-8 data for most fields, so it is currently
> unclear the best way to enforce that requirement.  For example, on Linx
> pathnames can contain arbitrary 8-bit character data, so a command like
> "status" would not know how to encode the reported pathnames.  We may want
> to revisit this (or double encode such strings) in the future.
> 
> The initial use for the json-writer routines is for generating telemetry
> data for executed Git commands.  Later, we may want to use them in other
> commands, such as status.
> 
> Helped-by: René Scharfe 
> Helped-by: Wink Saville 
> Helped-by: Ramsay Jones 
> Signed-off-by: Jeff Hostetler 
> ---
>  Makefile|   2 +
>  json-writer.c   | 395 +
>  json-writer.h   |  92 +++
>  t/helper/test-json-writer.c | 590 
> 
>  t/t0019-json-writer.sh  | 253 +++
>  5 files changed, 1332 insertions(+)
>  create mode 100644 json-writer.c
>  create mode 100644 json-writer.h
>  create mode 100644 t/helper/test-json-writer.c
>  create mode 100755 t/t0019-json-writer.sh
> 
[snip]

Thanks, this version fixes all issues I had (with the compilation
and sparse warnings).

[Was using UINT64_C(0x) a problem on windows?]

ATB,
Ramsay Jones




How to `git blame` individual characters?

2018-03-26 Thread Jan Keromnes
Dear Git developers,

I'd like to understand what would be required to run `git blame` on
individual characters instead of full lines. Could you please point me
in the right direction?

Someone asked a similar question about "Word-by-word blame/annotate"
on StackOverflow a few years ago, and one of the replies said:

> Git tracks changes snapshot by snapshot. The line-based format is calculated 
> on-the-fly, and it would also be possible to have a word-based format.

Source: https://stackoverflow.com/q/17758008/3461173

This leaves me hopeful that a character-based format can somehow be
achieved. Here is a fake example to illustrate what I'm looking for:

$ cat myscript.js
for (int foo = 0; foo <= 11; foo++) { console.log(foo); }
$ git blame --character-based --pseudo-json myscript.js
[
{ "commit": "abcd1234", "summary": "Implement loop",
"characters": "for (int " },
{ "commit": "bcd1234a", "summary": "Rename iterator",
"characters": "foo" },
{ "commit": "abcd1234", "summary": "Implement loop",
"characters": " = 0; " },
{ "commit": "bcd1234a", "summary": "Rename iterator",
"characters": "foo" },
{ "commit": "abcd1234", "summary": "Implement loop",
"characters": " <= " },
{ "commit": "cd1234ab", "summary": "Go up to 11",
"characters": "11" },
{ "commit": "abcd1234", "summary": "Implement loop",
"characters": "; " },
{ "commit": "bcd1234a", "summary": "Rename iterator",
"characters": "foo" },
{ "commit": "abcd1234", "summary": "Implement loop",
"characters": "++) { console.log(" },
{ "commit": "bcd1234a", "summary": "Rename iterator",
"characters": "foo" },
{ "commit": "abcd1234", "summary": "Implement loop",
"characters": ")" },
{ "commit": "d1234abc", "summary": "Add missing
semicolon", "characters": ";" },
{ "commit": "abcd1234", "summary": "Implement loop",
"characters": " }" }
]

What would be the most direct way to achieve such a character-based
blame/annotate? Should I write some sort of Git extension, or hack
into Git's source code?

E.g. I looked for an option in `git-blame` or `git-annotate` to change
the "next line boundary" from "carret return" (line-based blame) to
"any whitespace" (word-based blame) or "character-by-character"
(character-based blame), but I didn't find it. Could this be
implemented in `blame.c`? If so, which methods would need tweaking?

Many thanks!
Jan


Fwd: New Defects reported by Coverity Scan for git

2018-03-26 Thread Stefan Beller
coverity scan failed for the last couple month (since Nov 20th)
without me noticing, I plan on running it again nightly for the
Git project.

Anyway, here are issues that piled up (in origin/pu) since then.

Stefan


-- Forwarded message --
From:  
Date: Mon, Mar 26, 2018 at 4:24 PM
Subject: New Defects reported by Coverity Scan for git
To: sbel...@google.com


Hi,

Please find the latest report on new defect(s) introduced to git found
with Coverity Scan.

44 new defect(s) introduced to git found with Coverity Scan.
32 defect(s), reported by Coverity Scan earlier, were marked fixed in
the recent build analyzed by Coverity Scan.

New defect(s) Reported-by: Coverity Scan
Showing 20 of 44 defect(s)


** CID 1433546:  Resource leaks  (RESOURCE_LEAK)
/t/helper/test-path-utils.c: 236 in cmd__path_utils()



*** CID 1433546:  Resource leaks  (RESOURCE_LEAK)
/t/helper/test-path-utils.c: 236 in cmd__path_utils()
230 if (argc >= 4 && !strcmp(argv[1], "prefix_path")) {
231 const char *prefix = argv[2];
232 int prefix_len = strlen(prefix);
233 int nongit_ok;
234 setup_git_directory_gently(_ok);
235 while (argc > 3) {
>>> CID 1433546:  Resource leaks  (RESOURCE_LEAK)
>>> Failing to save or free storage allocated by "prefix_path(prefix, 
>>> prefix_len, argv[3])" leaks it.
236 puts(prefix_path(prefix, prefix_len, argv[3]));
237 argc--;
238 argv++;
239 }
240 return 0;
241 }

** CID 1433545:  Security best practices violations  (STRING_OVERFLOW)
/merge-recursive.c: 1955 in check_dir_renamed()



*** CID 1433545:  Security best practices violations  (STRING_OVERFLOW)
/merge-recursive.c: 1955 in check_dir_renamed()
1949  struct
hashmap *dir_renames)
1950 {
1951char temp[PATH_MAX];
1952char *end;
1953struct dir_rename_entry *entry;
1954
>>> CID 1433545:  Security best practices violations  (STRING_OVERFLOW)
>>> You might overrun the 4096-character fixed-size string "temp" by 
>>> copying "path" without checking the length.
1955strcpy(temp, path);
1956while ((end = strrchr(temp, '/'))) {
1957*end = '\0';
1958entry = dir_rename_find_entry(dir_renames, temp);
1959if (entry)
1960return entry;

** CID 1433544:  Resource leaks  (RESOURCE_LEAK)
/builtin/submodule--helper.c: 66 in print_default_remote()



*** CID 1433544:  Resource leaks  (RESOURCE_LEAK)
/builtin/submodule--helper.c: 66 in print_default_remote()
60  die(_("submodule--helper print-default-remote takes no
arguments"));
61
62  remote = get_default_remote();
63  if (remote)
64  printf("%s\n", remote);
65
>>> CID 1433544:  Resource leaks  (RESOURCE_LEAK)
>>> Variable "remote" going out of scope leaks the storage it points to.
66  return 0;
67 }
68
69 static int starts_with_dot_slash(const char *str)
70 {
71  return str[0] == '.' && is_dir_sep(str[1]);

** CID 1433543:  Null pointer dereferences  (NULL_RETURNS)
/merge-recursive.c: 812 in was_dirty()



*** CID 1433543:  Null pointer dereferences  (NULL_RETURNS)
/merge-recursive.c: 812 in was_dirty()
806 int dirty = 1;
807
808 if (o->call_depth || !was_tracked(path))
809 return !dirty;
810
811 ce = cache_file_exists(path, strlen(path), ignore_case);
>>> CID 1433543:  Null pointer dereferences  (NULL_RETURNS)
>>> Dereferencing a null pointer "ce".
812 dirty = (ce->ce_stat_data.sd_mtime.sec > 0 &&
813  verify_uptodate(ce, >unpack_opts) != 0);
814 return dirty;
815 }
816
817 static int make_room_for_path(struct merge_options *o, const char *path)

** CID 1433542:  Error handling issues  (CHECKED_RETURN)
/merge-recursive.c: 2162 in apply_directory_rename_modifications()



*** CID 1433542:  Error handling issues  (CHECKED_RETURN)
/merge-recursive.c: 2162 in apply_directory_rename_modifications()
2156 * "NOTE" in update_stages(), doing so will modify the current
2157 * in-memory index which will break calls to

Re: git submodule update - reset instead of checkout?

2018-03-26 Thread Stefan Beller
On Fri, Mar 9, 2018 at 7:07 AM Andreas Krey  wrote:

> Hi everyone,

> I've been reading up on the current state of git submodules
> (our customer's customers want to use them, causing a slight
> groan from our customer).

> The usability thing I wonder about - with git submodule update,
> is it necessary to detach the head to the current (or upstream)
> head, instead of resetting the current branch to that state?

Try "git checkout --recurse-submodules" or
"git reset --recurse-submodules"  (there is also the
submodule.recurse option in case you don't want to type
the option all the time)


> Primary interest in the question: Seeing 'detached head' scares
> almost everybody. To brainstorm:

I agree on that. That is what we are trying to work out
eventually, too.

One idea is to "reattach the submodule branch if it fits"
another idea would be a submodule ref store that is
(partially) tied to the superproject, such that the HEAD
of the submodule is non-existent for most of the time.
https://public-inbox.org/git/cover.1512168087.git.jonathanta...@google.com/

> - as we can already use 'submodule update --remote' to update
>to the remote head of a branch, it would be logical to have
>that branch checked out locally (and unfortunately, potentially
>have that branch's name conflict with the remote branch setup).

> - when developers more or less accidentally commit on the detached
>head, all is not lost yet (I remember this being differently),
>but if the next 'submodule update' just resets again, the commit
>just made is still dropped, just as in the detached head state.

> - So, we'd need to have 'submodule update' act closer to the pull or
>rebase counterparts and refuse to just lose commits (or uncommitted
>changes).

> Having a checked-out branch in the submodule would have the advantage
> that I can 'just' push local commits. At the moment, doing that requires
> a relatively intricate dance, not at all similar to working in the
> base (parent) module.

> I'm working on hooks that automatically update the submodules after
> a commit change (merge, pull, checkout) in the parent module, but
> with the additional feature of (a) keeping a branch checked out
> and (b) avoid discarding local changes. Probably means I shouldn't
> invoke 'submodule update' at all, and deal with everyting myself.

> Any thoughs/comments/helpful hints?

Our plan is to deprecate "git submodule" just like "git remote" is not
a well known tool any more. (e.g. Instead of git remote update, use
git fetch, that learned about updating the remote tracking branches
on its own)


> (Addional complexity: egit/jgit is in use as well, and the work model
> we will arrive at probabaly needs to work with the current egit.)

That sounds familiar, we also have JGit/Gerrit in the setup.

Stefan


Re: Will Coverity for FOSS be in scheduled maintenance mode forever?

2018-03-26 Thread Stefan Beller
On Tue, Mar 13, 2018 at 7:00 AM Johannes Schindelin <
johannes.schinde...@gmx.de> wrote:

> Hi Coverity (now Synopsys) team,

> I probably managed to miss important mails such as announcing that the
> Coverity service for FOSS projects would be unavailable for some time.

> Since February 20th, this seems to be the case (all attempts at even
> looking at past results redirects to https://brb.synopsys.com/ without any
> helpful further information), at least for the Git for Windows project.

> Is the deal off? Or will Coverity for FOSS come back? If the latter, when?

I noticed that there was an off period, but right now I can access it again.

Thanks for writing this email, it reminded me to check for the setup of Git
as well, which was not executed nightly since Nov 20th last year.
(the day when I migrated to a new machine... duct tape everywhere)

Thanks,
Stefan


Re: [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default

2018-03-26 Thread Stefan Beller
[snipped the cc list as well]

On Tue, Mar 6, 2018 at 12:06 PM Eddy Petrișor 
wrote:

> Signed-off-by: Eddy Petrișor 
> ---

Did this go anywhere?
(I just came back from a longer vacation, sorry for the delay on my site)

> There are projects such as llvm/clang which use several repositories, and
they
> might be forked for providing support for various features such as adding
Redox
> awareness to the toolchain. This typically means the superproject will use
> another branch than master, occasionally even use an old commit from that
> non-master branch.

> Combined with the fact that when incorporating such a hierachy of
repositories
> usually the user is interested in just the exact commit specified in the
> submodule info, it follows that a desireable usecase is to be also able to
> provide '--depth 1' to avoid waiting for ages for the clone operation to
> finish.

Very sensible.

> Git submodule seems to be very stubborn and cloning master, although the
> wrapper script and the gitmodules-helper could work together to clone
directly
> the branch specified in the .gitmodules file, if specified.

Also very sensible.

So far so good, could you move these paragraphs before the triple dashed
line
and sign off so we record it as the commit message?

> Another wrinkle is that when the commit is not the tip of the branch, the
depth
> parameter should somehow be stored in the .gitmodules info, but any
change in
> the submodule will break the supermodule submodule depth info sooner or
later,
> which is definitly frigile.

... which is why I would not include that.

git-fetch knows about --shallow-since or even better
shallow-exclude which could be set to the (depth+1)-th commit
(the boundary commit) recorded in the shallow information.

> I tried digging into this section of the code and debugging with bashdb
to see
> where --depth might fit, but I got stuck on the shell-to-helper
interaction and
> the details of the submodule implementation, so I want to lay out this
first
> patch as starting point for the discussion in the hope somebody else
picks it
> up or can provide some inputs. I have the feeling there are multiple code
paths
> that are being ran, depending on the moment (initial clone, submodule
> recursive, post-clone update etc.) and I have a gut feeling there
shouldn't be
> any code duplication just because the operation is different.

> This first patch is only trying to use a non-master branch, I have some
changes
> for the --depth part, but I stopped working on it due to the "default
depth"
> issue above.

> Does any of this sound reasonable?
> Is this patch idea usable or did I managed to touch the part of the code
that
> should not be touched?

This sounds reasonable. Thanks for writing the patch!

> diff --git a/git-submodule.sh b/git-submodule.sh
> index 2491496..370f19e 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -589,8 +589,11 @@ cmd_update()
>  branch=$(git submodule--helper remote-branch
"$sm_path")
>  if test -z "$nofetch"
>  then
> +   # non-default branch
> +   rbranch=$(git config -f .gitmodules
submodule.$sm_path.branch)
> +
br_refspec=${rbanch:+"refs/heads/$rbranch:refs/heads/$rbranch"}

Wouldn't we want to fetch into a remote tracking branch instead?
Instead of computing all this by yourself, these two lines could be

 br_refspec=$(git submodule--helper remote-branch $sm_path)

I would think.

>  # Fetch remote before determining
tracking $sha1
> -   fetch_in_submodule "$sm_path" $depth ||
> +   fetch_in_submodule "$sm_path" $depth
$br_refspec ||
>  die "$(eval_gettext "Unable to fetch in
submodule path '\$sm_path'")"
>  fi
>  remote_name=$(sanitize_submodule_env; cd
"$sm_path" && get_default_remote)

It would be awesome if you could write a little test for this feature, too.
Look for the tests in regarding --remote in t7406 (in the t/ directory) as
a starting point, please.

Thanks!
Stefan


Re: [PATCH v2 01/36] t/helper: add an empty test-tool program

2018-03-26 Thread SZEDER Gábor
On Tue, Mar 27, 2018 at 12:14 AM, Johannes Schindelin
 wrote:
> However, it seems that something is off, as
> ba5bec9589e9eefe2446044657963e25b7c8d88e is reported as fine on Windows:
> https://travis-ci.org/git/git/jobs/358260023 (while there is clearly a red
> X next to that commit in
> https://github.com/git/git/commits/ba5bec9589e9eefe2446044657963e25b7c8d88e,
> that X is due to a hiccup on macOS).
>
> It seems that the good-trees feature for Travis does not quite work as
> intended. Gábor?

AFAICT it works as expected.

When a build job encounters a commit with a tree that has previously
been built and tested successfully, then first it says so, like this:

  https://travis-ci.org/szeder/git/jobs/347295038#L635

and then skips the rest of the build job (see the 'exit 0' a few lines
later).

In case of this Windows build job we haven't seen this tree yet:

  https://travis-ci.org/git/git/jobs/358260023#L467

so the build job continues as usual (see the 'test -z Windows' two lines
later).

Unfortunately, I have no idea about how the rest of the Windows build
job is supposed to work...


Re: [PATCH 04/10] t1405: sort reflog entries in a hash-independent way

2018-03-26 Thread brian m. carlson
On Mon, Mar 26, 2018 at 03:18:20PM -0700, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> 
> > The test enumerates reflog entries in an arbitrary order and then sorts
> > them.  For SHA-1, this produces results that happen to sort in
> > alphabetical order, but for other hash algorithms they sort differently.
> > Ensure we sort the reflog entries in a hash-independent way by sorting
> > on the ref name instead of the object ID.
> 
> Makes sense.
> 
> > Remove an assumption about the length of a hash by using cut with
> > the delimiter and field options instead of the character range
> > option.
> 
> I thought you used your truncated blake hash to develop and verify
> these changes, but I'd imagine that the "42" thing were hard to
> spot.

Yeah, I had to do those by hand, but it looks like I did miss one.  I'll
reroll with that change.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Including object type and size in object id (Re: Git Merge contributor summit notes)

2018-03-26 Thread Junio C Hamano
Jonathan Nieder  writes:

> This implies a limit on the object size (e.g. 5 bytes in your
> example).  What happens when someone wants to encode an object larger
> than that limit?
>
> This also decreases the number of bits available for the hash, but
> that shouldn't be a big issue.

I actually thought that the latter "downside" makes the object name
a tad larger.

But let's not go there, really.

"X is handy if we can get it on the surface without looking into it"
will grow.  Somebody may want to have the generation number of a
commit in the commit object name.  Yet another somebody may want to
be able to quickly learn the object name for the top-level tree from
the commit object name alone.  We need to stop somewhere, and as
already suggested in the thread(s), having auxiliary look-up table
is a better way to go, encoding nothing in the name, as we are going
to need such a look-up table because it is unrealistic to encode
everything we would want in the name anyway.



Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Johannes Schindelin
Hi all,

On Mon, 26 Mar 2018, Wink Saville wrote:

> > I was just going by what the reported compiler error message was.
> > It said that "unsigned long" didn't match the uint64_t variable.
> > And that made me nervous.
> >
> > If all of the platforms we build on define uintmax_t >= 64 bits,
> > then it doesn't matter.
> >
> > If we do have a platform where uintmax_t is u32, then we'll have a
> > lot more breakage than in just the new function I added.
> >
> > Thanks,
> > Jeff
> 
> Should we add a "_Static_assert" that sizeof(uintmax_t) >= sizeof(uint64_t) ?

To come back to the subject that all of the mails in this here mail thread
of you gentle people bear: Wink's patches look good to me, and I would
like to have them be bumped down the next/master waterfall.

Ciao,
Dscho


Re: [PATCH 00/10] Hash-independent tests (part 1)

2018-03-26 Thread brian m. carlson
On Sun, Mar 25, 2018 at 10:10:21PM -0400, Eric Sunshine wrote:
> What's the plan for oddball cases such as 66ae9a57b8 (t3404: rebase
> -i: demonstrate short SHA-1 collision, 2013-08-23) which depend
> implicitly upon SHA-1 without actually hardcoding any hashes? The test
> added by 66ae9a57b8, for instance, won't start failing in the face of
> NewHash, but it also won't be testing anything meaningful.
> 
> Should such tests be dropped altogether? Should they be marked with a
> 'SHA1' predicate or be annotated with a comment as being
> SHA-1-specific? Something else?

My plan for these was to treat them the same way as for git rev-parse
and git hash-object.  Basically, for the moment, I had planned to ignore
them, although I like the idea for a prerequisite to get the full
testsuite passing in the interim.

Ultimately, we could use some sort of lookup table or a test helper to
translate them so that we get functionally equivalent results.  t1512 is
another great example of this same kind of test.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: get commit ID from a tree object ID

2018-03-26 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:
> On Sat, Mar 17, 2018 at 10:57 AM Junio C Hamano  wrote:
>> Jeff King  writes:

>>> If you want to dig further, you can use the diff machinery to show which
>>> commit introduced a particular tree, like:
>>>
>>>   git rev-list --all |
>>>   git diff-tree --stdin --pretty=raw --raw -t -r |
>>>   less +/$desired_tree
>>>
>>> That "less" will find the mentioned tree, and then you'll have to
>>> manually read the commit. It would be possible to do it mechanically
>>> with a short perl script, but I'll leave that as an exercise for the
>>> reader.
>
>> Before Stefan jumps in ;-) I wonder if a recently materialized
>> "find-object" option to the diff family can be used here as a
>> sugar-coated way.
>
> I am late to jump in, but testing the 'git log --find-object'
> seems to have issues with trees named by sha1 here,
> but the named tree via : still seems to work.

Experimenting a little more, I wondered if "-t" wasn't being passed
by default:

  $ git --oneline log -t --find-object=$(git rev-parse 
HEAD~30:Documentation/technical)
  $

No, that's not it.  Could it have to do with merges?

  $ git log --oneline -m --first-parent --find-object=$(git rev-parse 
HEAD~30:Documentation/technical)
  df6cfe8c30 Merge branch 'debian-experimental' into google3
  f86d5fd2e4 Merge branch 'debian-experimental' into google3

Yes.

That doesn't explain why : worked for you, though. :)

Thanks,
Jonathan


Re: [PATCH 04/10] t1405: sort reflog entries in a hash-independent way

2018-03-26 Thread Junio C Hamano
"brian m. carlson"  writes:

> The test enumerates reflog entries in an arbitrary order and then sorts
> them.  For SHA-1, this produces results that happen to sort in
> alphabetical order, but for other hash algorithms they sort differently.
> Ensure we sort the reflog entries in a hash-independent way by sorting
> on the ref name instead of the object ID.

Makes sense.

> Remove an assumption about the length of a hash by using cut with
> the delimiter and field options instead of the character range
> option.

I thought you used your truncated blake hash to develop and verify
these changes, but I'd imagine that the "42" thing were hard to
spot.

Thanks.


> Signed-off-by: brian m. carlson 
> ---
>  t/t1405-main-ref-store.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
> index e8115df5ba..a1e243a05c 100755
> --- a/t/t1405-main-ref-store.sh
> +++ b/t/t1405-main-ref-store.sh
> @@ -45,7 +45,7 @@ test_expect_success 'rename_refs(master, new-master)' '
>  '
>  
>  test_expect_success 'for_each_ref(refs/heads/)' '
> - $RUN for-each-ref refs/heads/ | cut -c 42- >actual &&
> + $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
>   cat >expected <<-\EOF &&
>   master 0x0
>   new-master 0x0
> @@ -71,7 +71,7 @@ test_expect_success 'verify_ref(new-master)' '
>  '
>  
>  test_expect_success 'for_each_reflog()' '
> - $RUN for-each-reflog | sort | cut -c 42- >actual &&
> + $RUN for-each-reflog | sort -k2 | cut -c 42- >actual &&
>   cat >expected <<-\EOF &&
>   HEAD 0x1
>   refs/heads/master 0x0


Re: get commit ID from a tree object ID

2018-03-26 Thread Stefan Beller
On Sat, Mar 17, 2018 at 10:57 AM Junio C Hamano  wrote:

> Jeff King  writes:

> > If you want to dig further, you can use the diff machinery to show which
> > commit introduced a particular tree, like:
> >
> >   git rev-list --all |
> >   git diff-tree --stdin --pretty=raw --raw -t -r |
> >   less +/$desired_tree
> >
> > That "less" will find the mentioned tree, and then you'll have to
> > manually read the commit. It would be possible to do it mechanically
> > with a short perl script, but I'll leave that as an exercise for the
> > reader.

> Before Stefan jumps in ;-) I wonder if a recently materialized
> "find-object" option to the diff family can be used here as a
> sugar-coated way.

I am late to jump in, but testing the 'git log --find-object'
seems to have issues with trees named by sha1 here,
but the named tree via : still seems to work.

It seems reasonable to be able to find trees using the find-object
flag, though.

Stefan


Re: [PATCH v2 01/36] t/helper: add an empty test-tool program

2018-03-26 Thread Johannes Schindelin
Hi Duy,

On Mon, 26 Mar 2018, Duy Nguyen wrote:

> On Mon, Mar 26, 2018 at 5:27 PM, Johannes Schindelin
>  wrote:
> >
> > On Sat, 24 Mar 2018, Nguyễn Thái Ngọc Duy wrote:
> >
> >> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> >> new file mode 100644
> >> index 00..c730f718ca
> >> --- /dev/null
> >> +++ b/t/helper/test-tool.c
> >> @@ -0,0 +1,27 @@
> >> +#include "git-compat-util.h"
> >> +#include "test-tool.h"
> >> +
> >> +struct test_cmd {
> >> + const char *name;
> >> + int (*main)(int argc, const char **argv);
> >
> > This makes the build fail on Windows, as we override `main` in
> > compat/mingw.h:
> 
> Sigh.. not complaining, but I wish somebody tries to compile git with
> wine (and automate it in travis). This way we could at least cover the
> compilation part for all major platforms. Probably too small for a
> GSoC (and making the test suite pass with wine may be too large for
> GSoC)

We do have Continuous Testing of maint, master, next & pu.

However, it seems that something is off, as
ba5bec9589e9eefe2446044657963e25b7c8d88e is reported as fine on Windows:
https://travis-ci.org/git/git/jobs/358260023 (while there is clearly a red
X next to that commit in
https://github.com/git/git/commits/ba5bec9589e9eefe2446044657963e25b7c8d88e,
that X is due to a hiccup on macOS).

It seems that the good-trees feature for Travis does not quite work as
intended. Gábor?

Ciao,
Dscho

Re: [GSoC] Convert “git stash” to builtin proposal

2018-03-26 Thread Johannes Schindelin
Hi Paul,

On Sun, 25 Mar 2018, Paul-Sebastian Ungureanu wrote:

> One thing I did not mention in the previous reply was that I also added
> a new paragraph to "Benefits to community" about 'git stash' being slow
> on Windows for a lot of users. I consider this alone to be a very good
> justification for this project and doing this project will be very
> beneficial for the Windows users.

Yes!

Thank you,
Johannes


Re: [PATCH v2] Makefile: detect compiler and enable more warnings in DEVELOPER=1

2018-03-26 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> The set of extra warnings we enable when DEVELOPER has to be
> conservative because we can't assume any compiler version the
> developer may use. Detect the compiler version so we know when it's
> safe to enable -Wextra and maybe more.

This is a good idea in general, but we are not quite ready without
some fixups.  

Here is a quick summary (not exhaustive) from my trial merge to 'pu'
(which will be reverted before today's integration is pushed out).

 - json-writer.c triggers -Werror=old-style-decl

 - t/helper/test-json-writer.c triggers Werror=missing-prototypes
   quite a few times.

 - connect.c -Werror=implicit-fallthough around die_initial_contact().



Re: Should I try to fix rebase interactive preserve-merges bug

2018-03-26 Thread Johannes Schindelin
Hi Jake & Wink,

On Sun, 25 Mar 2018, Jacob Keller wrote:

> On Sun, Mar 25, 2018 at 10:32 AM, Wink Saville  wrote:
> > There is a "TODO known breakage" in t3404-rebase-interactve.sh:
> >
> >not ok 24 - exchange two commits with -p # TODO known breakage
> >
> > I'm contemplating trying to fix it. But with --recreate-merges coming
> > maybe it's not worth the effort. Should I proceed with attempting a
> > fix or is --preserve-merges going to be a synonym for
> > --recreate-merges?
> 
> AFAIK this breakage of preserve-merges is a design flaw which isn't
> really fixable, which is why --recreate-merges is being added.
> 
> I believe the plan is to deprecate preserve-merges once
> recreate-merges has landed.

Indeed, my plan is to deprecate preserve-merges once recreate-merges
landed and proved robust.

The problem with the `pick` overloading in preserve-merges is many-fold:

- `pick` is used for merge commits, too. Therefore, the parents of the
  picked commits are *implicit*: a rebased commit will always have the
  rebased parents (or the original parents, for commits that were not
  (yet) rebased).

  The only way to fix this would be to break the current syntax, and
  thereby break existing users' scripts. (I know of a few users, and I
  know better than assuming that there are no power users out there
  scripting *everything* on top of Git, as I am such a power user
  myself).

- One might be tempted to special-case that implicit parent-keeping for
  the cases where merge commits are picked, and respect order for
  non-merge commits.

  However, *nothing* in the todo list (apart from the commit messages,
  which can easily be edited however) indicates whether a `pick` has a
  merge or a non-merge commit as parameter. That makes this "solution" a
  highly unintuive and inconsistent one.

- Even worse: if you ever found yourself editing long-ish todo lists in an
  alternative of preserve-merges (such as Git for Windows' garden shears
  [*1*]), you will know that it is *important* to know when you cross a
  merge commit line. Reordering within a linear list of non-merge commits
  is relatively safe. Moving a `pick` across a merge commit line is
  definitely causing a lot of grief, generally speaking.

  In short: my hard-won experience tells me that a todo list *needs* a
  visual cue that lets users tell apart the sequences of non-merge commits
  from the merge commit lines.

So there is more than just a technical problem with the design of the
preserve-merges feature. Its UI is fundamentally broken.

Which makes sense, as I intended preserve-merges to be non-interactive
only, and there was a single reason why it was piggy-backed on top of the
interactive rebase: this was the easiest way to implement that feature. I
was never a fan of enabling -p with -i, and I am fairly certain that I
warned against this. Turns out I was right. Strange, I know.

Ciao,
Johannes

Footnote *1*:
https://github.com/git-for-windows/build-extra/blob/master/shears.sh


Re: [PATCH v3] Allow use of TLS 1.3

2018-03-26 Thread Daniel Stenberg

On Mon, 26 Mar 2018, Johannes Schindelin wrote:

Can we *please* also add that OpenSSL 1.1.* is required (or that cURL is 
built with NSS or BoringSSL as the TLS backend)?


We might consider adding a way to extract that info from curl to make that 
work really good for you. There are now six TLS libraries that support TLS 1.3 
and it might be hard for git to figure out the exact situation for each 
library and keep track of these moving targets...


--

 / daniel.haxx.se


Re: git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-26 Thread Ævar Arnfjörð Bjarmason

On Mon, Mar 26 2018, Rafael Ascensao wrote:

> One of the tools that manages PKGBUILDS for Arch Linux stores PKGBUILD
> git repos inside a cache directory for reuse.
>
> One of the repos is triggering some unexpected behaviour that can be
> reproduced in the CLI with:
>
>   $ GIT_DIR=spotify/.git GIT_WORK_TREE=spotify git reset HEAD
>   fatal: couldn't read spotify/.git/packed-refs: Not a directory
>
> Moving up one directory works as expected.
>   $ cd ..
>   $ GIT_DIR=sync/spotify/.git GIT_WORK_TREE=sync/spotify git reset HEAD
>
> Using -C seems to work fine too.
>   $ git -C spotify reset HEAD
>
> $ GIT_TRACE_SETUP=2 GIT_TRACE=2 GIT_DIR="spotify/.git"
> GIT_WORK_TREE="spotify" git reset HEAD
> 22:10:53.020525 trace.c:315 setup: git_dir: spotify/.git
> 22:10:53.020605 trace.c:316 setup: git_common_dir: spotify/.git
> 22:10:53.020616 trace.c:317 setup: worktree:
> /home/rafasc/.cache/aurutils/sync/spotify
> 22:10:53.020625 trace.c:318 setup: cwd:
> /home/rafasc/.cache/aurutils/sync
> 22:10:53.020635 trace.c:319 setup: prefix: (null)
> 22:10:53.020644 git.c:344   trace: built-in: git 'reset' 'HEAD'
> fatal: couldn't read spotify/.git/packed-refs: Not a directory
>
> The issue seems to bisect to f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0
> I can't pinpoint why this particular repo is behaving differently.

CC-ing Duy, the author of that commit.


Re: Including object type and size in object id (Re: Git Merge contributor summit notes)

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 5:00 PM, Jonathan Nieder wrote:

Jeff Hostetler wrote:
[long quote snipped]


While we are converting to a new hash function, it would be nice
if we could add a couple of fields to the end of the OID:  the object
type and the raw uncompressed object size.

If would be nice if we could extend the OID to include 6 bytes of data
(4 or 8 bits for the type and the rest for the raw object size), and
just say that an OID is a {hash,type,size} tuple.

There are lots of places where we open an object to see what type it is
or how big it is.  This requires uncompressing/undeltafying the object
(or at least decoding enough to get the header).  In the case of missing
objects (partial clone or a gvfs-like projection) it requires either
dynamically fetching the object or asking an object-size-server for the
data.

All of these cases could be eliminated if the type/size were available
in the OID.


This implies a limit on the object size (e.g. 5 bytes in your
example).  What happens when someone wants to encode an object larger
than that limit?


I could say add a full uint64 to the tail end of the hash, but
we currently don't handle blobs/objects larger then 4GB right now
anyway, right?

5 bytes for the size is just a compromise -- 1TB blobs would be
terrible to think about...
 


This also decreases the number of bits available for the hash, but
that shouldn't be a big issue.


I was suggesting extending the OIDs by 6 bytes while we are changing
the hash function.


Aside from those two, I don't see any downsides.  It would mean that
tree objects contain information about the sizes of blobs contained
there, which helps with virtual file systems.  It's also possible to
do that without putting the size in the object id, but maybe having it
in the object id is simpler.

Will think more about this.

Thanks for the idea,
Jonathan



Thanks
Jeff



Re: [PATCH v3] Allow use of TLS 1.3

2018-03-26 Thread Johannes Schindelin
Hi Logan,

On Mon, 26 Mar 2018, Loganaden Velvindron wrote:

> Add a tlsv1.3 option to http.sslVersion in addition to the existing
> tlsv1.[012] options. libcurl has supported this since 7.52.0.
> 
> Signed-off-by: Loganaden Velvindron 

Can we *please* also add that OpenSSL 1.1.* is required (or that cURL is
built with NSS or BoringSSL as the TLS backend)?

See
https://public-inbox.org/git/nycvar.qro.7.76.6.1803240035300...@zvavag-6oxh6da.rhebcr.pbec.zvpebfbsg.pbz/
for my original please.

I deem this information *really* important because a lot of Git packages
are still built against OpenSSL 1.0.2 (e.g. Git for Windows) and *won't*
benefit immediately from your patch.

Ciao,
Johannes


[PATCH 2/2] mingw/msvc: use the new-style RUNTIME_PREFIX helper

2018-03-26 Thread Johannes Schindelin
This change also allows us to stop overriding argv[0] with the absolute
path of the executable, allowing us to preserve e.g. the case of the
executable's file name.

This fixes https://github.com/git-for-windows/git/issues/1496 partially.

Signed-off-by: Johannes Schindelin 
---
 compat/mingw.c   | 5 ++---
 config.mak.uname | 2 ++
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index a67872b..6ded1c8 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2221,7 +2221,7 @@ void mingw_startup(void)
die_startup();
 
/* determine size of argv and environ conversion buffer */
-   maxlen = wcslen(_wpgmptr);
+   maxlen = wcslen(wargv[0]);
for (i = 1; i < argc; i++)
maxlen = max(maxlen, wcslen(wargv[i]));
for (i = 0; wenv[i]; i++)
@@ -2241,8 +2241,7 @@ void mingw_startup(void)
buffer = malloc_startup(maxlen);
 
/* convert command line arguments and environment to UTF-8 */
-   __argv[0] = wcstoutfdup_startup(buffer, _wpgmptr, maxlen);
-   for (i = 1; i < argc; i++)
+   for (i = 0; i < argc; i++)
__argv[i] = wcstoutfdup_startup(buffer, wargv[i], maxlen);
for (i = 0; wenv[i]; i++)
environ[i] = wcstoutfdup_startup(buffer, wenv[i], maxlen);
diff --git a/config.mak.uname b/config.mak.uname
index e1cfe5e..a6e734c 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -357,6 +357,7 @@ ifeq ($(uname_S),Windows)
SNPRINTF_RETURNS_BOGUS = YesPlease
NO_SVN_TESTS = YesPlease
RUNTIME_PREFIX = YesPlease
+   HAVE_WPGMPTR = YesWeDo
NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
NO_NSEC = YesPlease
USE_WIN32_MMAP = YesPlease
@@ -506,6 +507,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_SVN_TESTS = YesPlease
NO_PERL_MAKEMAKER = YesPlease
RUNTIME_PREFIX = YesPlease
+   HAVE_WPGMPTR = YesWeDo
NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
NO_NSEC = YesPlease
USE_WIN32_MMAP = YesPlease
-- 
2.7.4



[PATCH 1/2] exec_cmd: provide a new-style RUNTIME_PREFIX helper for Windows

2018-03-26 Thread Johannes Schindelin
The RUNTIME_PREFIX feature comes from Git for Windows, but it was
enhanced to allow support for other platforms. While changing the
original idea, the concept was also improved by not forcing argv[0] to
be adjusted.

Let's allow the same for Windows by implementing a helper just as for
the other platforms.

Signed-off-by: Johannes Schindelin 
---
 Makefile   |  8 
 exec_cmd.c | 22 ++
 2 files changed, 30 insertions(+)

diff --git a/Makefile b/Makefile
index f84e816..c944168 100644
--- a/Makefile
+++ b/Makefile
@@ -453,6 +453,10 @@ all::
 # When using RUNTIME_PREFIX, define HAVE_NS_GET_EXECUTABLE_PATH if your 
platform
 # supports calling _NSGetExecutablePath to retrieve the path of the running
 # executable.
+#
+# When using RUNTIME_PREFIX, define HAVE_WPGMPTR if your platform offers
+# the global variable _wpgmptr containing the absolute path of the current
+# executable (this is the case on Windows).
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1693,6 +1697,10 @@ ifdef HAVE_NS_GET_EXECUTABLE_PATH
BASIC_CFLAGS += -DHAVE_NS_GET_EXECUTABLE_PATH
 endif
 
+ifdef HAVE_WPGMPTR
+   BASIC_CFLAGS += -DHAVE_WPGMPTR
+endif
+
 ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
diff --git a/exec_cmd.c b/exec_cmd.c
index 38d52d9..6e114f8 100644
--- a/exec_cmd.c
+++ b/exec_cmd.c
@@ -144,6 +144,24 @@ static int git_get_exec_path_darwin(struct strbuf *buf)
 }
 #endif /* HAVE_NS_GET_EXECUTABLE_PATH */
 
+#ifdef HAVE_WPGMPTR
+/*
+ * Resolves the executable path by using the global variable _wpgmptr.
+ *
+ * Returns 0 on success, -1 on failure.
+ */
+static int git_get_exec_path_wpgmptr(struct strbuf *buf)
+{
+   int len = wcslen(_wpgmptr) * 3 + 1;
+   strbuf_grow(buf, len);
+   len = xwcstoutf(buf->buf, _wpgmptr, len);
+   if (len < 0)
+   return -1;
+   buf->len += len;
+   return 0;
+}
+#endif /* HAVE_WPGMPTR */
+
 /*
  * Resolves the absolute path of the current executable.
  *
@@ -178,6 +196,10 @@ static int git_get_exec_path(struct strbuf *buf, const 
char *argv0)
git_get_exec_path_procfs(buf) &&
 #endif /* PROCFS_EXECUTABLE_PATH */
 
+#ifdef HAVE_WPGMPTR
+   git_get_exec_path_wpgmptr(buf) &&
+#endif /* HAVE_WPGMPTR */
+
git_get_exec_path_from_argv0(buf, argv0)) {
return -1;
}
-- 
2.7.4



[PATCH 0/2] Add Windows support to the new RUNTIME_PREFIX design

2018-03-26 Thread Johannes Schindelin
Even if the RUNTIME_PREFIX feature originates from Git for Windows, the
current patch series is different enough in its design that it leaves the
Windows-specific RUNTIME_PREFIX handling in place: On Windows, we still
have to override argv[0] with the absolute path of the current `git`
executable.

Let's just port the Windows-specific code over to the new design and get
rid of that argv[0] overwriting.

This also partially addresses a very obscure problem reported on the Git
for Windows bug tracker, where misspelling a builtin command using a
creative mIxEd-CaSe version could lead to an infinite ping-pong between
git.exe and Git for Windows' "Git wrapper" (that we use in place of
copies when on a file system without hard-links, most notably FAT).

Dan, I would be delighted if you could adopt these patches into your patch
series.

Johannes Schindelin (2):
  exec_cmd: provide a new-style RUNTIME_PREFIX helper for Windows
  mingw/msvc: use the new-style RUNTIME_PREFIX helper

 Makefile |  8 
 compat/mingw.c   |  5 ++---
 config.mak.uname |  2 ++
 exec_cmd.c   | 22 ++
 4 files changed, 34 insertions(+), 3 deletions(-)

-- 
2.7.4



git complains packed-refs is not a directory when used with GIT_DIR and GIT_WORK_TREE envvars.

2018-03-26 Thread Rafael Ascensao
One of the tools that manages PKGBUILDS for Arch Linux stores PKGBUILD
git repos inside a cache directory for reuse.

One of the repos is triggering some unexpected behaviour that can be
reproduced in the CLI with:

  $ GIT_DIR=spotify/.git GIT_WORK_TREE=spotify git reset HEAD
  fatal: couldn't read spotify/.git/packed-refs: Not a directory

Moving up one directory works as expected.
  $ cd ..
  $ GIT_DIR=sync/spotify/.git GIT_WORK_TREE=sync/spotify git reset HEAD

Using -C seems to work fine too.
  $ git -C spotify reset HEAD

$ GIT_TRACE_SETUP=2 GIT_TRACE=2 GIT_DIR="spotify/.git"
GIT_WORK_TREE="spotify" git reset HEAD
22:10:53.020525 trace.c:315 setup: git_dir: spotify/.git
22:10:53.020605 trace.c:316 setup: git_common_dir: spotify/.git
22:10:53.020616 trace.c:317 setup: worktree:
/home/rafasc/.cache/aurutils/sync/spotify
22:10:53.020625 trace.c:318 setup: cwd:
/home/rafasc/.cache/aurutils/sync
22:10:53.020635 trace.c:319 setup: prefix: (null)
22:10:53.020644 git.c:344   trace: built-in: git 'reset' 'HEAD'
fatal: couldn't read spotify/.git/packed-refs: Not a directory

The issue seems to bisect to f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0
I can't pinpoint why this particular repo is behaving differently.

Cumprimentos,
Rafael Ascensão


Re: Per-object encryption (Re: Git Merge contributor summit notes)

2018-03-26 Thread Ævar Arnfjörð Bjarmason

On Mon, Mar 26 2018, Jonathan Nieder wrote:

> Hi Ævar,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> It occurred to me recently that once we have such a layer it could be
>> (ab)used with some relatively minor changes to do any arbitrary
>> local-to-remote object content translation, unless I've missed something
>> (but I just re-read hash-function-transition.txt now...).
>>
>> E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
>> remote server so that you upload a GPG encrypted version of all your
>> blobs, and have your trees reference those blobs.
>
> Interesting!
>
> To be clear, this would only work with deterministic encryption.
> Normal GPG encryption would not have the round-tripping properties
> required by the design.

Right, sorry. I was being lazy. For simplicity let's say rot13 or some
other deterministic algorithm.

> If I understand correctly, it also requires both sides of the
> connection to have access to the encryption key.  Otherwise they
> cannot perform ordinary operations like revision walks.  So I'm not
> seeing a huge advantage over ordinary transport-layer encryption.
>
> That said, it's an interesting idea --- thanks for that.  I'm changing
> the subject line since otherwise there's no way I'll find this again. :)

In this specific implementation I have in mind only one side would have
the key, we'd encrypt just up to the point where the repository would
still pass fsck. But of course once we had that facility we could do any
arbitrary translation .

I.e. consider the latest commit in git.git:

commit 90bbd502d54fe920356fa9278055dc9c9bfe9a56
tree 5539308dc384fd11055be9d6a0cc1cce7d495150
parent 085f5f95a2723e8f9f4d037c01db5b786355ba49
parent d32eb83c1db7d0a8bb54fe743c6d1dd674d372c5
author Junio C Hamano  1521754611 -0700
committer Junio C Hamano  1521754611 -0700

Sync with Git 2.16.3

With rot13 "encryption" it would be:

commit 
tree 
parent 
parent 
author Whavb P Unznab  1521754611 -0700
committer Whavb P Unznab  1521754611 -0700

Flap jvgu Tvg 2.16.3

And an ls-tree on that tree hash would instead of README.md give you:

100644 blob  ERNQZR.zq

And inspecting that blob would give you:

# Rot13'd "Hello, World!"
Uryyb, Jbeyq!

So obviously for the encryption use-case such a repo would leak a lot of
info compared to just uploading the fast-export version of it
periodically as one big encrypted blob to store somewhere, but the
advantage would be:

 * It's better than existing "just munge the blobs" encryption solutions
   bolted on top of git, because at least you encrypt the commit
   message, author names & filenames.

 * Since it would be a valid repo even without the key, you could use
   git hosting solutions for it, similar to checking in encrypted blobs
   in existing git repos.

 * As noted, it could be a permanent stress test on the SHA-1<->NewHash
   codepath.

   I can't think of a reason for why once we have that we couldn't add
   the equivalent of clean/smudge filters.

   We need to unpack & repack & re-hash all the stuff we send over the
   wire anyway, so we can munge it as it goes in/out as long as the same
   input values always yield the same output values.


Re: [PATCH v7 0/3] RUNTIME_PREFIX relocatable Git

2018-03-26 Thread Johannes Schindelin
Hi,

On Mon, 26 Mar 2018, Daniel Jacques wrote:

> On Mon, Mar 26, 2018 at 2:01 AM Junio C Hamano  wrote:
> 
> > I wonder if the relocatable Git would allow a simpler arrangement to
> > test without installing.
> 
> > I am asking merely out of curiosity, not suggesting to make a trial
> > install somewhere in the build area and run the built Git normally
> > without GIT_EXEC_PATH trick.
> 
> RUNTIME_PREFIX resolves paths relative to the runtime path of the Git
> binary. These path expectations are constructed around installation
> directories, so I'd expect that installation is a prerequisite of testing.

Indeed. This is the relevant part of the code:

if (!prefix &&
!(prefix = strip_path_suffix(argv0_path, GIT_EXEC_PATH)) &&
!(prefix = strip_path_suffix(argv0_path, BINDIR)) &&
!(prefix = strip_path_suffix(argv0_path, "git"))) {
prefix = FALLBACK_RUNTIME_PREFIX;
trace_printf("RUNTIME_PREFIX requested, "
"but prefix computation failed.  "
"Using static fallback '%s'.\n", prefix);
}

Note how the argv0_path (which is the absolute path of the directory
*containing* the `git` executable) is tested for several suffixes, i.e.
trailing directories, namely

libexec/git-core
bin
git

That means that you will have to have your `git` executable built in a
worktree whose absolute path ends in one of these.

While writing this reply, I was wondering why "git" is included in this
list. You know, I can see libexec/git-core and bin because that is where
the `git` executable is installed to (or hard-linked to). But "git"?

Turns out that I am the responsible person for that (024aa7d8d51
(system_path(): simplify using strip_path_suffix(), and add suffix "git",
2009-02-19)), having assumed back then that everybody who uses the
RUNTIME_PREFIX feature and works on Git does so in /git/. Which is of
course no longer true in general. For example, I myself got bitten by this
when developing some patches on top of Dan's patch series in a *linked
worktree*  of the name "runtime-prefix". Oh well.

But the short answer is: no, you cannot rely on the RUNTIME_PREFIX feature
for running Git's own test suite. The GIT_EXEC_PATH method to force Git's
test suite to use the compiled executables is still required.

Even if it is fragile: if git-FOO exists in /bin/ and some test
relies on the FOO subcommand but we removed it from the source code to
test whether it is needed, the test suite would pass just fine because it
finds git-FOO in the PATH.

*sigh* seems that I cannot write short answers.

Ciao,
Dscho


[PATCH] git-stash.txt: remove extra square bracket

2018-03-26 Thread Thomas Gummerer
In 1ada5020b3 ("stash: use stash_push for no verb form", 2017-02-28),
when the pathspec argument was introduced in 'git stash', that was also
documented.  However I forgot to remove an extra square bracket after
the '--message' argument, even though the square bracket should have
been after the pathspec argument (where it was also added).

Remove the extra square bracket after the '--message' argument, to show
that the pathspec argument should be used with the 'push' verb.

While the pathspec argument can be used without the push verb, that's a
special case described later in the man page, and removing the first extra
square bracket instead of the second one makes the synopis easier to
understand.

Signed-off-by: Thomas Gummerer 
---
 Documentation/git-stash.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index 056dfb8661..7ef8c47911 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 'git stash' ( pop | apply ) [--index] [-q|--quiet] []
 'git stash' branch  []
 'git stash' [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
-[-u|--include-untracked] [-a|--all] [-m|--message ]]
+[-u|--include-untracked] [-a|--all] [-m|--message ]
 [--] [...]]
 'git stash' clear
 'git stash' create []
-- 
2.17.0.rc1.35.g90bbd502d5.dirty



Re: [PATCH 1/1] repository.h: add comment and clarify repo_set_gitdir

2018-03-26 Thread Stefan Beller
On Mon, Mar 26, 2018 at 2:04 PM Stefan Beller  wrote:



> On Fri, Mar 23, 2018 at 8:55 AM Nguyễn Thái Ngọc Duy 
wrote:
> >
> > The argument name "optional" may mislead the reader to think this
> > option could be NULL. But it can't be. While at there, document a bit
> > more about struct set_gitdir_args.
> >
> > Signed-off-by: Nguyễn Thái Ngọc Duy 

> Reviewed-by: Stefan Beller 

Uh, not so fast. Is it worthwhile to align the parameter name in
the c file as well?


Re: [PATCH 1/1] repository.h: add comment and clarify repo_set_gitdir

2018-03-26 Thread Stefan Beller
On Fri, Mar 23, 2018 at 8:55 AM Nguyễn Thái Ngọc Duy 
wrote:

> The argument name "optional" may mislead the reader to think this
> option could be NULL. But it can't be. While at there, document a bit
> more about struct set_gitdir_args.

> Signed-off-by: Nguyễn Thái Ngọc Duy 

Reviewed-by: Stefan Beller 


Including object type and size in object id (Re: Git Merge contributor summit notes)

2018-03-26 Thread Jonathan Nieder
(administrivia: please omit parts of the text you are replying to that
 are not relevant to the reply.  This makes it easier to see what you're
 replying to, especially in mail readers that don't hide quoted text by
 the default)
Hi Jeff,

Jeff Hostetler wrote:
[long quote snipped]

> While we are converting to a new hash function, it would be nice
> if we could add a couple of fields to the end of the OID:  the object
> type and the raw uncompressed object size.
>
> If would be nice if we could extend the OID to include 6 bytes of data
> (4 or 8 bits for the type and the rest for the raw object size), and
> just say that an OID is a {hash,type,size} tuple.
>
> There are lots of places where we open an object to see what type it is
> or how big it is.  This requires uncompressing/undeltafying the object
> (or at least decoding enough to get the header).  In the case of missing
> objects (partial clone or a gvfs-like projection) it requires either
> dynamically fetching the object or asking an object-size-server for the
> data.
>
> All of these cases could be eliminated if the type/size were available
> in the OID.

This implies a limit on the object size (e.g. 5 bytes in your
example).  What happens when someone wants to encode an object larger
than that limit?

This also decreases the number of bits available for the hash, but
that shouldn't be a big issue.

Aside from those two, I don't see any downsides.  It would mean that
tree objects contain information about the sizes of blobs contained
there, which helps with virtual file systems.  It's also possible to
do that without putting the size in the object id, but maybe having it
in the object id is simpler.

Will think more about this.

Thanks for the idea,
Jonathan


Re: [PATCH] Remove contrib/examples/*

2018-03-26 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> + * Doing `git log` on paths matching '*--helper.c' will show
> +   incremental effort in the direction of moving existing shell
> +   scripts to C.

It may be benefitial to remind readers of "--full-diff", e.g.

$ git log --full-diff --stat -p -- "${foo}--helper.c"

here.


Re: [RFC][PATCH] git-stash: convert git stash list to C builtin

2018-03-26 Thread Thomas Gummerer
On 03/26, Johannes Schindelin wrote:
> Hi Eric,
> 
> On Sun, 25 Mar 2018, Eric Sunshine wrote:
> 
> > On Sat, Mar 24, 2018 at 2:23 PM, Paul-Sebastian Ungureanu
> >  wrote:
> > > Currently, because git stash is not fully converted to C, I
> > > introduced a new helper that will hold the converted commands.
> > > ---
> > > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> > > @@ -0,0 +1,52 @@
> > > +int cmd_stash__helper(int argc, const char **argv, const char *prefix)
> > > +{
> > > +   int cmdmode = 0;
> > > +
> > > +   struct option options[] = {
> > > +   OPT_CMDMODE(0, "list", ,
> > > +N_("list stash entries"), LIST_STASH),
> > > +   OPT_END()
> > > +   };
> > 
> > Is the intention that once git-stash--helper implements all 'stash'
> > functionality, you will simply rename git-stash--helper to git-stash?
> > If so, then I'd think that you'd want it to accept subcommand
> > arguments as bare words ("apply", "drop") in order to be consistent
> > with the existing git-stash command set, not in dashed form
> 
> Why not start with cmdmode, and then add a single patch that *also*
> accepts argv[1] as a bare-word cmdmode?

I don't think we should accept the dashed form of the commands for
'git stash'.  The main reason being that we also have 'git stash'
without any arguments, which acts as 'git stash push'.  So if we would
ever come up with an argument to 'git stash push', that matches one of
the current verbs, or if we come up with a new verb that matches one
of the options to 'git stash push', that would not work.

In that case we could obviously go for a different word, but I think
the rules when 'git stash' is going to be 'git stash push' and when it
is not are already complicated enough, and allowing the verbs as
dashed options would only make the interface more complicated.

> This could even potentially be a patch to parse-options.[ch] that
> introduces, say, PARSE_OPT_ALLOW_BARE_CMDMODE.

Now if we'd take that one step further and make it
PARSE_OPT_BARE_CMDMODE, which would only allow the non-dashed options,
I think we could use that in other places in git as well (for example
in 'git worktree').

> Ciao,
> Dscho


Per-object encryption (Re: Git Merge contributor summit notes)

2018-03-26 Thread Jonathan Nieder
Hi Ævar,

Ævar Arnfjörð Bjarmason wrote:

> It occurred to me recently that once we have such a layer it could be
> (ab)used with some relatively minor changes to do any arbitrary
> local-to-remote object content translation, unless I've missed something
> (but I just re-read hash-function-transition.txt now...).
>
> E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
> remote server so that you upload a GPG encrypted version of all your
> blobs, and have your trees reference those blobs.

Interesting!

To be clear, this would only work with deterministic encryption.
Normal GPG encryption would not have the round-tripping properties
required by the design.

If I understand correctly, it also requires both sides of the
connection to have access to the encryption key.  Otherwise they
cannot perform ordinary operations like revision walks.  So I'm not
seeing a huge advantage over ordinary transport-layer encryption.

That said, it's an interesting idea --- thanks for that.  I'm changing
the subject line since otherwise there's no way I'll find this again. :)

Jonathan


Re: [PATCH v4 06/11] pack-objects: move in_pack out of struct object_entry

2018-03-26 Thread Stefan Beller
Hi,

sorry for the late review, as I am pointed here indirectly via
https://public-inbox.org/git/xmqqy3iebpsw@gitster-ct.c.googlers.com/

On Fri, Mar 16, 2018 at 11:33 AM Nguyễn Thái Ngọc Duy 
wrote:

> +LIMITATIONS
> +---
> +
> +This command could only handle 16384 existing pack files at a time.

s/could/can/ ?

> @@ -3191,6 +3200,9 @@ int cmd_pack_objects(int argc, const char **argv,
const char *prefix)
>  }
>  }

> +   /* make sure IN_PACK(0) return NULL */

I was confused for a while staring at this comment, /s/0/NULL/
would have helped me.

> +static inline unsigned int oe_add_pack(struct packing_data *pack,
> +  struct packed_git *p)
> +{
> +   if (pack->in_pack_count >= (1 << OE_IN_PACK_BITS))
> +   die(_("too many packs to handle in one go. "
> + "Please add .keep files to exclude\n"
> + "some pack files and keep the number "
> + "of non-kept files below %d."),
> +   1 << OE_IN_PACK_BITS);

The packs are indexed 0..N-1, so we can actually handle N
packs I presume. But if we actually have N, then we'd run the

   /* make sure IN_PACK(0) return NULL */
   oe_add_pack(.., NULL);

as N+1, hence the user can only do N-1 ?

Oh wait! the code below makes me think we index from 1..N,
treating index 0 special as uninitialized? So we actually can only
store N-1 ?


> +   if (p) {
> +   if (p->index > 0)

s/>/!=/ ?

The new index variable is only used in these three
inlined header functions, and in_pack_count is strictly
positive, so index as well as in_pack_count could be made
unsigned?

Given that oe_add_pack returns an unsigned, I would actually
prefer to have in_pack_count an unsigned as well.

> +   die("BUG: this packed is already indexed");
> +   p->index = pack->in_pack_count;
> +   }
> +   pack->in_pack[pack->in_pack_count] = p;
> +   return pack->in_pack_count++;
> +}
> +
> +static inline struct packed_git *oe_in_pack(const struct packing_data
*pack,
> +   const struct object_entry *e)
> +{
> +   return pack->in_pack[e->in_pack_idx];
> +
> +}

extra new line after return?

> +static inline void oe_set_in_pack(struct object_entry *e,
> + struct packed_git *p)
> +{
> +   if (p->index <= 0)
> +   die("BUG: found_pack should be NULL "
> +   "instead of having non-positive index");

Do we also want to guard against
 p->index > (1 << OE_IN_PACK_BITS)
here? Also there is a BUG() macro, that would be better
as it reports line file/number, but we cannot use it here as
it is a header inline.


Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Junio C Hamano
Wink Saville  writes:

> Should we add a "_Static_assert" that sizeof(uintmax_t) >= sizeof(uint64_t) ?

If that expression compiles, then both types are understood by the
platform.  Because

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/stdint.h.html

tells us:

Greatest-width integer types

The following type designates a signed integer type capable of
representing any value of any signed integer type: intmax_t

The following type designates an unsigned integer type capable of
representing any value of any unsigned integer type: uintmax_t

These types are required.

we know what that expression evaluates to, no?



Re: [PATCH 00/27] sb/object-store updates

2018-03-26 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> Thanks for driving this when I was away!
>>
>> With the fixup patch, both series are
>> Reviewed-by: Stefan Beller 
>
> I think everybody involved agrees that these two you cited above are
> already in good shape.  Let's have them in 'next' for the remainder
> of this cycle and go incremental, and merge them to 'master' soon
> after 2.17 final is tagged.
>
> Thanks all for working well together.  

Just FYI, when merging the topic nd/pack-objects-pack-struct to a
codebase with these two topics, because the former adds a handful of
inline functions that wants to see packed_git but all existing
sources have relied on the rule that cache.h is included early and
the fact that packed_git was declared in there, the change in
sb/object-store to move the structure declaration to object-store.h
causes quite a trouble.  I plan to resolve this with an evil merge
to include object-store.h from pack-objects.h to resolve this with
minimum damage to topics in flight.




Re: [PATCH v2 1/6] stash: Add tests for passing in too many refs

2018-03-26 Thread Johannes Schindelin
Hi Joel,

On Sun, 25 Mar 2018, Joel Teichroeb wrote:

> Signed-off-by: Joel Teichroeb 

I could imagine that the commit message would benefit from this body:

In preparation for converting the stash command incrementally to
a builtin command, this patch improves test coverage of the option
parsing.

> diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
> index aefde7b172..7146e27bb5 100755
> --- a/t/t3903-stash.sh
> +++ b/t/t3903-stash.sh
> @@ -45,6 +45,12 @@ test_expect_success 'applying bogus stash does nothing' '
>   test_cmp expect file
>  '
>  
> +test_expect_success 'applying with too many agruments does nothing' '
> + test_must_fail git stash apply stash@{0} bar &&
> + echo 1 >expect &&
> + test_cmp expect file
> +'

I suppose you encountered a problem where `stash apply a b` would modify
the file?

And if you really want to verify that the command does nothing, I guess
you will have to use

test-chmtime =123456789 file &&
test_must_fail git stash apply stash@{0} bar &&
test 123456789 = $(test-chmtime -v +0 file | sed 's/[^0-9].*$//')

> @@ -97,6 +103,10 @@ test_expect_success 'stash drop complains of extra 
> options' '
>   test_must_fail git stash drop --foo
>  '
>  
> +test_expect_success 'stash drop complains with too many refs' '
> + test_must_fail git stash drop stash@{1} stash@{2}

I wonder whether you might want to verify that the error message is
printed, e.g. via

test_must_fail git stash drop stash@{1} stash@{2} 2>err &&
test_i18ngrep "Too many" err

Also, since the added tests look very similar, it might make sense to use
a loop (with fixed revision arguments).

Ciao,
Dscho


Re: [RFC][PATCH] git-stash: convert git stash list to C builtin

2018-03-26 Thread Johannes Schindelin
Hi Eric,

On Sun, 25 Mar 2018, Eric Sunshine wrote:

> On Sat, Mar 24, 2018 at 2:23 PM, Paul-Sebastian Ungureanu
>  wrote:
> > Currently, because git stash is not fully converted to C, I
> > introduced a new helper that will hold the converted commands.
> > ---
> > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> > @@ -0,0 +1,52 @@
> > +int cmd_stash__helper(int argc, const char **argv, const char *prefix)
> > +{
> > +   int cmdmode = 0;
> > +
> > +   struct option options[] = {
> > +   OPT_CMDMODE(0, "list", ,
> > +N_("list stash entries"), LIST_STASH),
> > +   OPT_END()
> > +   };
> 
> Is the intention that once git-stash--helper implements all 'stash'
> functionality, you will simply rename git-stash--helper to git-stash?
> If so, then I'd think that you'd want it to accept subcommand
> arguments as bare words ("apply", "drop") in order to be consistent
> with the existing git-stash command set, not in dashed form

Why not start with cmdmode, and then add a single patch that *also*
accepts argv[1] as a bare-word cmdmode?

This could even potentially be a patch to parse-options.[ch] that
introduces, say, PARSE_OPT_ALLOW_BARE_CMDMODE.

Ciao,
Dscho


Re: [PATCH 2/2] doc hash-function-transition: clarify what SHAttered means

2018-03-26 Thread Eric Sunshine
On Mon, Mar 26, 2018 at 2:27 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Attempt to clarify what the SHAttered attack means in practice for
> Git. The previous version of the text made no mention whatsoever of
> Git already having a mitigation for this specific attack, which the
> SHAttered researchers claim will detect cryptanalytic collision
> attacks.
> [...]
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> diff --git a/Documentation/technical/hash-function-transition.txt 
> b/Documentation/technical/hash-function-transition.txt
> @@ -28,11 +28,30 @@ advantages:
>  Over time some flaws in SHA-1 have been discovered by security
> +researchers. On 23 February 2017 the SHAttered attack
> +(https://shattered.io) demonstrated a practical SHA-1 hash collision.
> +
> +Git v2.13.0 and later subsequently moved to a hardened SHA-1
> +implementation by default, which isn't vulnerable to the SHAttered
> +attack.
> +
> +Thus Git has in effect already migrated to a new hash that isn't SHA-1
> +and doesn't share its vulnerabilities, its new hash function just
> +happens to produce exactly the same output for all known inputs,
> +except two PDFs published by the SHAttered researchers, and the new
> +implementation (written by those researchers) claims to detect future
> +cryptanalytic collision attacks.
> +
> +Regardless, it's considered prudent to move past any variant of SHA-1
> +to a new hash. There's no guarantee that future attacks on SHA-1 won't
> +be published in the future, and those attacks may not have viable
> +mitigations.
> +
> +If SHA-1 and its variants were to be truly broken Git's hash function

s/broken/&,/

> +could not be considered cryptographically secure any more. This would
> +impact the communication of hash values because we could not trust
> +that a given hash value represented the known good version of content
> +that the speaker intended.


Re: query on git submodule (ignore)

2018-03-26 Thread Stefan Beller
On Sun, Mar 25, 2018 at 9:46 AM Junio C Hamano  wrote:

> prashant Nidgunde  writes:

> [cc: stefan, for his interest in improving 'git submodules']

> > Hello,
> >
> > I am new to this community ,so please ignore if I am asking anything
silly.
> >
> > Case :
> > Today when I built my submodule , and did a git status , it shows as
modified.
> >
> > After reading certain suggestions on web i found out that i can ignore
> > that adding a line in .gitmodules
> >
> > But, I had to add that line manually ( which could be errorprone
> > because of typos )
> >
> >
> > Question:
> > 1. Is it feasible to build a feature like :
> >git submodule "zlib" ignore dirty ( which will
> > ignore submodule zlib when its built and dirty  as it has new files in
> > its directory)

> How does it prevent you from saying

>  git submodule "glib" ignore dirty

> when you really meant "zlib"?  How is the command supposed to know
> that you did *not* mean "glib", which may currently not exist in the
> index nor in the working tree yet but you are about to create, and
> doing the "ignore dirty" configuration as an earlier step of
> multiple steps to add a submodule?

> I personally doubt that the main issue you should be concerned about
> is feasibility.  A larger issue is how it is supposed to help, iow,
> is such a "feature" useful in the first place?

> Whenever you hear yourself say "I have to do X manually, and I can
> make mistakes. Can a command do it instead?", you have to ask
> yourself: what pieces of information do you give to that command,
> and how do you ensure you do not make typos on that command line?

> Besides, the above syntax would not work.  What would a user do when
> a submodule called "add" exists, for example?

I would think this can be solved by reordering the command to be

   git submodule ignore [--option-for-granularity=dirty] [-- ]

However for now this would be a shallow wrapper to

   name=$(git submodule--helper name )
   git config submodule.$name.ignore dirty

if I understand the requested feature correctly?
Instead of coming up with each name for each submodule path
(btw name and path are most often the same any way,
just put it here for correctness), one could also use

   git submodule foreach -- \
   git -C .. config submodule.$name.ignore dirty

as the foreach already populates the $name variable.

Stefan


Re: Git Merge contributor summit notes

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 1:56 PM, Stefan Beller wrote:

On Mon, Mar 26, 2018 at 10:33 AM Jeff Hostetler 
wrote:




On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote:


On Sat, Mar 10 2018, Alex Vandiver wrote:


New hash (Stefan, etc)
--
   - discussed on the mailing list
   - actual plan checked in to

Documentation/technical/hash-function-transition.txt

   - lots of work renaming
   - any actual work with the transition plan?
   - local conversion first; fetch/push have translation table
   - like git-svn
   - also modified pack and index format to have lookup/translation

efficiently

   - brian's series to eliminate SHA1 strings from the codebase
   - testsuite is not working well because hardcoded SHA1 values
   - flip a bit in the sha1 computation and see what breaks in the

testsuite

   - will also need a way to do the conversion itself; traverse and

write out new version

   - without that, can start new repos, but not work on old ones
   - on-disk formats will need to change -- something to keep in mind

with new index work

   - documentation describes packfile and index formats
   - what time frame are we talking?
   - public perception question
   - signing commits doesn't help (just signs commit object) unless you

"recursive sign"

   - switched to SHA1dc; we detect and reject known collision technique
   - do it now because it takes too long if we start when the collision

drops

   - always call it "new hash" to reduce bikeshedding
   - is translation table a backdoor? has it been reviewed by crypto

folks?

 - no, but everything gets translated
   - meant to avoid a flag day for entire repositories
   - linus can decide to upgrade to newhash; if pushes to server that

is not newhash aware, that's fine

   - will need a wire protocol change
   - v2 might add a capability for newhash
   - "now that you mention md5, it's a good idea"
   - can use md5 to test the conversion
   - is there a technical reason for why not /n/ hashes?
   - the slow step goes away as people converge to the new hash
   - beneficial to make up some fake hash function for testing
   - is there a plan on how we decide which hash function?
   - trust junio to merge commits when appropriate
   - conservancy committee explicitly does not make code decisions
   - waiting will just give better data
   - some hash functions are in silicon (e.g. microsoft cares)
   - any movement in libgit2 / jgit?
 - basic stuff for libgit2; same testsuite problems
 - no work in jgit
   - most optimistic forecast?
 - could be done in 1-2y
   - submodules with one hash function?
 - unable to convert project unless all submodules are converted
 - OO-ing is not a prereq


Late reply, but one thing I brought up at the time is that we'll want to
keep this code around even after the NewHash migration at least for
testing purposes, should we ever need to move to NewNewHash.

It occurred to me recently that once we have such a layer it could be
(ab)used with some relatively minor changes to do any arbitrary
local-to-remote object content translation, unless I've missed something
(but I just re-read hash-function-transition.txt now...).

E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
remote server so that you upload a GPG encrypted version of all your
blobs, and have your trees reference those blobs.

Because we'd be doing arbitrary translations for all of
commits/trees/blobs this could go further than other bolted-on
encryption solutions for Git. E.g. paths in trees could be encrypted
too, as well as all the content of the commit object that isn't parent
info & the like (but that would have different hashes).

Basically clean/smudge filters on steroids, but for every object in the
repo. Anyone who got a hold of it would still see the shape of the repo
& approximate content size, but other than that it wouldn't be more info
than they'd get via `fast-export --anonymize` now.

I mainly find it interesting because presents an intersection between a
feature we might want to offer anyway, and something that would stress
the hash transition codepath going forward, to make sure it hasn't all
bitrotted by the time we'll need NewHash->NewNewHash.

Git hosting providers would hate it, but they should probably be
charging users by how much Michael Haggerty's git-sizer tool hates their
repo anyway :)




While we are converting to a new hash function, it would be nice
if we could add a couple of fields to the end of the OID:  the object
type and the raw uncompressed object size.


This would allow to craft invalid OIDs, i.e. the correct hash value with
the wrong object type. (This is different field of "invalid" compared to
today, where we either have or do not have the object named by the
hash value. If we don't have it, it may be just unknown to us, but not
"wrong".)


An invalid OID (such as a wrong object type) could be detected as soon
as we open the object and read the 

Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Wink Saville
> I was just going by what the reported compiler error message was.
> It said that "unsigned long" didn't match the uint64_t variable.
> And that made me nervous.
>
> If all of the platforms we build on define uintmax_t >= 64 bits,
> then it doesn't matter.
>
> If we do have a platform where uintmax_t is u32, then we'll have a
> lot more breakage than in just the new function I added.
>
> Thanks,
> Jeff

Should we add a "_Static_assert" that sizeof(uintmax_t) >= sizeof(uint64_t) ?


Re: [PATCH] branch -l: print useful info whilst rebasing a non-local branch

2018-03-26 Thread Jacob Keller
On Mon, Mar 26, 2018 at 12:44 AM, Eric Sunshine  wrote:
> On Mon, Mar 26, 2018 at 3:25 AM, Jeff King  wrote:
>> OK, so here's some patches. We could do the first three now, wait a
>> while before the fourth, and then wait a while (or never) on the fifth.
>>
>>   [1/5]: t3200: unset core.logallrefupdates when testing reflog creation
>>   [2/5]: t: switch "branch -l" to "branch --create-reflog"
>>   [3/5]: branch: deprecate "-l" option
>>   [4/5]: branch: drop deprecated "-l" option
>>   [5/5]: branch: make "-l" a synonym for "--list"
>
> The entire series looks good to me. FWIW,
>
> Reviewed-by: Eric Sunshine 

Same to me.

Reviewed-by: Jacob Keller 

Thanks,
Jake


Re: [PATCH 0/2] doc hash-function-transition: minor & major clarifications

2018-03-26 Thread Stefan Beller
On Mon, Mar 26, 2018 at 11:27 AM Ævar Arnfjörð Bjarmason 
wrote:

> Having read through the hash-function-transition.txt again, a couple
> of things jumped out at me:

> Ævar Arnfjörð Bjarmason (2):
>doc hash-function-transition: clarify how older gits die on NewHash

> We weren't accurately describing how "git status" would die on NewHash
> repos on new versions.

>doc hash-function-transition: clarify what SHAttered means

> I don't think we had a good summary of how SHA-1 vulnerabilities
> overlap with concerns Git has, now that we've moved to the hardened
> SHA-1.

> I may very well have gotten this new summary subtly wrong though. So
> please review.

>   .../technical/hash-function-transition.txt| 40 +++
>   1 file changed, 32 insertions(+), 8 deletions(-)

Both patches look good to me.

Stefan


Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 2:00 PM, Junio C Hamano wrote:

Jeff Hostetler  writes:


I am concerned that the above compiler error message says that uintmax_t
is defined as an "unsigned long" (which is defined as *at least* 32 bits,
but not necessarily 64.  But a uint64_t is defined as a "unsigned long long"
and guaranteed as a 64 bit value.


On a platform whose uintmax_t is u32, is it realistic to expect that
we would be able to use u64, even if we explicitly ask for it, in
the first place?

In other words, on a platform that handles uint64_t, I would expect
uintmax_t to be wide enough to hold an uint64_t value without
truncation.



I was just going by what the reported compiler error message was.
It said that "unsigned long" didn't match the uint64_t variable.
And that made me nervous.

If all of the platforms we build on define uintmax_t >= 64 bits,
then it doesn't matter.

If we do have a platform where uintmax_t is u32, then we'll have a
lot more breakage than in just the new function I added.

Thanks,
Jeff


Re: [GSoC] Re: Info: git log --oneline improvements

2018-03-26 Thread Stefan Beller
On Sat, Mar 24, 2018 at 4:15 AM Christian Couder

wrote:

> Hi,

> On Sat, Mar 24, 2018 at 9:41 AM, Pratik Karki 
wrote:
> >
> > Hi Christian and Johannes,
> >
> > Though I sent a mail earlier, saying I would like to submit another
> > proposal, I am now skeptical on re-writing another proposal as you
> > guys are only available mentor for GSoC

Is there a place/email where I can read up on the git log --oneline
improvements? I use the --oneline flag a lot and would be interested
in hearing what you plan there.

> Well Stefan Beller wrote in


https://public-inbox.org/git/cagz79kax5hip1wp3u60im__dm0gvh8nnd+byxg5oxmxrrkr...@mail.gmail.com/

> that he would be ok to co-mentor, but I am not sure which projects he
> would be ok to co-mentor. I just Cc'ed him so he can tell us more.

I would be happy to co mentor any project by prioritizing some
review time on it.

Regarding rewriting shell scripts in C,
there have been GSoC students in the previous years, which may help you
understand the scope of the project better.

 git clone --mirror https://public-inbox.org/git
 git clone https://git.kernel.org/pub/scm/git/git.git/

and search via "git log --author=" for
Paul Tan, who converted git-am and git-pull into C
or Prathamesh Chavan, who rewrote git-submodule
to be in C, but the result is not upstream (yet).

I think there are more shell to C conversions, but I was involved
with these 2, hence giving these examples.

> > and I believe Git doesn't
> > select more than 2 proposals.

> Yeah, because mentoring is a lot of work, and doesn't always work out
> as well as we would expect, (mostly because it is difficult to explain
> to new contributors that review cycles for significant patch series
> take a lot more time than they could imagine), so not many people
> volunteer to mentor or co-mentor.

I agree on this. Mentoring takes some time as well.
The coding part is the easy part, the communication part with
the community is the harder part.

For example the Git community is spread across all time zones.
So if you want to give everyone the opportunity to give input to
your patches you want to wait 2 hours.

The community consists of people with diverse backgrounds, e.g.
some work on Git in their spare time on the weekends [only]. Others
work on it as their job [so it is rather a Mon-Fri activity]; others glance
at the mailing list in the lunch break or in the evenings.

These timing issues are not the hardest part,
but the easiest to explain shortly. ;)

> I still hope though that over time some former GSoC student will
> become (co-)mentors as this happens quite often in other projects that
> participates in the GSoC.


[PATCH 0/2] doc hash-function-transition: minor & major clarifications

2018-03-26 Thread Ævar Arnfjörð Bjarmason
Having read through the hash-function-transition.txt again, a couple
of things jumped out at me:

Ævar Arnfjörð Bjarmason (2):
  doc hash-function-transition: clarify how older gits die on NewHash

We weren't accurately describing how "git status" would die on NewHash
repos on new versions.

  doc hash-function-transition: clarify what SHAttered means

I don't think we had a good summary of how SHA-1 vulnerabilities
overlap with concerns Git has, now that we've moved to the hardened
SHA-1.

I may very well have gotten this new summary subtly wrong though. So
please review.

 .../technical/hash-function-transition.txt| 40 +++
 1 file changed, 32 insertions(+), 8 deletions(-)

-- 
2.16.2.804.g6dcf76e118



[PATCH 2/2] doc hash-function-transition: clarify what SHAttered means

2018-03-26 Thread Ævar Arnfjörð Bjarmason
Attempt to clarify what the SHAttered attack means in practice for
Git. The previous version of the text made no mention whatsoever of
Git already having a mitigation for this specific attack, which the
SHAttered researchers claim will detect cryptanalytic collision
attacks.

I may have gotten some of the nuances wrong, but as far as I know this
new text accurately summarizes the current situation with SHA-1 in
git. I.e. git doesn't really use SHA-1 anymore, it uses
Hardened-SHA-1 (they just so happen to produce the same outputs
99.999...% of the time).

Thus the previous text was incorrect in asserting that:

[...]As a result [of SHAttered], SHA-1 cannot be considered
cryptographically secure any more[...]

That's not the case. We have a mitigation against SHAttered, *however*
we consider it prudent to move to work towards a NewHash should future
vulnerabilities in either SHA-1 or Hardened-SHA-1 emerge.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 .../technical/hash-function-transition.txt| 29 +++
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/hash-function-transition.txt 
b/Documentation/technical/hash-function-transition.txt
index 34396f13ec..34b8b83a34 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -28,11 +28,30 @@ advantages:
   address stored content.
 
 Over time some flaws in SHA-1 have been discovered by security
-researchers. https://shattered.io demonstrated a practical SHA-1 hash
-collision. As a result, SHA-1 cannot be considered cryptographically
-secure any more. This impacts the communication of hash values because
-we cannot trust that a given hash value represents the known good
-version of content that the speaker intended.
+researchers. On 23 February 2017 the SHAttered attack
+(https://shattered.io) demonstrated a practical SHA-1 hash collision.
+
+Git v2.13.0 and later subsequently moved to a hardened SHA-1
+implementation by default, which isn't vulnerable to the SHAttered
+attack.
+
+Thus Git has in effect already migrated to a new hash that isn't SHA-1
+and doesn't share its vulnerabilities, its new hash function just
+happens to produce exactly the same output for all known inputs,
+except two PDFs published by the SHAttered researchers, and the new
+implementation (written by those researchers) claims to detect future
+cryptanalytic collision attacks.
+
+Regardless, it's considered prudent to move past any variant of SHA-1
+to a new hash. There's no guarantee that future attacks on SHA-1 won't
+be published in the future, and those attacks may not have viable
+mitigations.
+
+If SHA-1 and its variants were to be truly broken Git's hash function
+could not be considered cryptographically secure any more. This would
+impact the communication of hash values because we could not trust
+that a given hash value represented the known good version of content
+that the speaker intended.
 
 SHA-1 still possesses the other properties such as fast object lookup
 and safe error checking, but other hash functions are equally suitable
-- 
2.16.2.804.g6dcf76e118



[PATCH 1/2] doc hash-function-transition: clarify how older gits die on NewHash

2018-03-26 Thread Ævar Arnfjörð Bjarmason
Change the "Repository format extension" to accurately describe what
happens with different versions of Git when they encounter NewHash
repositories, instead of only saying what happens with versions v2.7.0
and later.

See ab9cb76f66 ("Repository format version check.", 2005-11-25) and
00a09d57eb ("introduce "extensions" form of
core.repositoryformatversion", 2015-06-23) for the relevant changes to
the setup code where these variables are checked.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/technical/hash-function-transition.txt | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/hash-function-transition.txt 
b/Documentation/technical/hash-function-transition.txt
index 417ba491d0..34396f13ec 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -116,10 +116,15 @@ Documentation/technical/repository-version.txt) with 
extensions
objectFormat = newhash
compatObjectFormat = sha1
 
-Specifying a repository format extension ensures that versions of Git
-not aware of NewHash do not try to operate on these repositories,
-instead producing an error message:
+The combination of setting `core.repositoryFormatVersion=1` and
+populating `extensions.*` ensures that all versions of Git later than
+`v0.99.9l` will die instead of trying to operate on the NewHash
+repository, instead producing an error message.
 
+   # Between v0.99.9l and v2.7.0
+   $ git status
+   fatal: Expected git repo version <= 0, found 1
+   # After v2.7.0
$ git status
fatal: unknown repository extensions found:
objectformat
-- 
2.16.2.804.g6dcf76e118



Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 1:57 PM, Junio C Hamano wrote:

Jeff Hostetler  writes:


I defined that routine to take a uint64_t because I wanted to
pass a nanosecond value received from getnanotime() and that's
what it returns.


Hmph, but the target format does not have different representation
of inttypes in different sizes, no?

I personally doubt that we would benefit from having a group of
functions (i.e. format_int{8,16,32,64}_to_json()) that callers have
to choose from, depending on the exact size of the integer they want
to serialize.  The de-serializing side would be the same story.

Even if the variable a potential caller of the formetter is a sized
type that is different from uintmax_t, the caller shouldn't have to
add an extra cast.

Am I missing some obvious merit for having these separate functions
for explicit sizes?



I did the uint64_t for the unsigned ns times.

I did the other one for the usual signed ints.

I could convert them both to a single signed 64 bit typed function
if we only want to have one function.

Jeff



Re: [PATCH 00/27] sb/object-store updates

2018-03-26 Thread Junio C Hamano
Stefan Beller  writes:

> Thanks for driving this when I was away!
>
> With the fixup patch, both series are
> Reviewed-by: Stefan Beller 

I think everybody involved agrees that these two you cited above are
already in good shape.  Let's have them in 'next' for the remainder
of this cycle and go incremental, and merge them to 'master' soon
after 2.17 final is tagged.

Thanks all for working well together.  



Re: Git Merge contributor summit notes

2018-03-26 Thread Brandon Williams
On 03/26, Jeff Hostetler wrote:
> 
> 
> On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote:
> > 
> > On Sat, Mar 10 2018, Alex Vandiver wrote:
> > 
> > > New hash (Stefan, etc)
> > > --
> > >   - discussed on the mailing list
> > >   - actual plan checked in to 
> > > Documentation/technical/hash-function-transition.txt
> > >   - lots of work renaming
> > >   - any actual work with the transition plan?
> > >   - local conversion first; fetch/push have translation table
> > >   - like git-svn
> > >   - also modified pack and index format to have lookup/translation 
> > > efficiently
> > >   - brian's series to eliminate SHA1 strings from the codebase
> > >   - testsuite is not working well because hardcoded SHA1 values
> > >   - flip a bit in the sha1 computation and see what breaks in the 
> > > testsuite
> > >   - will also need a way to do the conversion itself; traverse and write 
> > > out new version
> > >   - without that, can start new repos, but not work on old ones
> > >   - on-disk formats will need to change -- something to keep in mind with 
> > > new index work
> > >   - documentation describes packfile and index formats
> > >   - what time frame are we talking?
> > >   - public perception question
> > >   - signing commits doesn't help (just signs commit object) unless you 
> > > "recursive sign"
> > >   - switched to SHA1dc; we detect and reject known collision technique
> > >   - do it now because it takes too long if we start when the collision 
> > > drops
> > >   - always call it "new hash" to reduce bikeshedding
> > >   - is translation table a backdoor? has it been reviewed by crypto folks?
> > > - no, but everything gets translated
> > >   - meant to avoid a flag day for entire repositories
> > >   - linus can decide to upgrade to newhash; if pushes to server that is 
> > > not newhash aware, that's fine
> > >   - will need a wire protocol change
> > >   - v2 might add a capability for newhash
> > >   - "now that you mention md5, it's a good idea"
> > >   - can use md5 to test the conversion
> > >   - is there a technical reason for why not /n/ hashes?
> > >   - the slow step goes away as people converge to the new hash
> > >   - beneficial to make up some fake hash function for testing
> > >   - is there a plan on how we decide which hash function?
> > >   - trust junio to merge commits when appropriate
> > >   - conservancy committee explicitly does not make code decisions
> > >   - waiting will just give better data
> > >   - some hash functions are in silicon (e.g. microsoft cares)
> > >   - any movement in libgit2 / jgit?
> > > - basic stuff for libgit2; same testsuite problems
> > > - no work in jgit
> > >   - most optimistic forecast?
> > > - could be done in 1-2y
> > >   - submodules with one hash function?
> > > - unable to convert project unless all submodules are converted
> > > - OO-ing is not a prereq
> > 
> > Late reply, but one thing I brought up at the time is that we'll want to
> > keep this code around even after the NewHash migration at least for
> > testing purposes, should we ever need to move to NewNewHash.
> > 
> > It occurred to me recently that once we have such a layer it could be
> > (ab)used with some relatively minor changes to do any arbitrary
> > local-to-remote object content translation, unless I've missed something
> > (but I just re-read hash-function-transition.txt now...).
> > 
> > E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
> > remote server so that you upload a GPG encrypted version of all your
> > blobs, and have your trees reference those blobs.
> > 
> > Because we'd be doing arbitrary translations for all of
> > commits/trees/blobs this could go further than other bolted-on
> > encryption solutions for Git. E.g. paths in trees could be encrypted
> > too, as well as all the content of the commit object that isn't parent
> > info & the like (but that would have different hashes).
> > 
> > Basically clean/smudge filters on steroids, but for every object in the
> > repo. Anyone who got a hold of it would still see the shape of the repo
> > & approximate content size, but other than that it wouldn't be more info
> > than they'd get via `fast-export --anonymize` now.
> > 
> > I mainly find it interesting because presents an intersection between a
> > feature we might want to offer anyway, and something that would stress
> > the hash transition codepath going forward, to make sure it hasn't all
> > bitrotted by the time we'll need NewHash->NewNewHash.
> > 
> > Git hosting providers would hate it, but they should probably be
> > charging users by how much Michael Haggerty's git-sizer tool hates their
> > repo anyway :)
> > 
> 
> While we are converting to a new hash function, it would be nice
> if we could add a couple of fields to the end of the OID:  the object
> type and the raw uncompressed object size.
> 
> If would be nice if we could extend the OID to include 6 bytes of data

Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Junio C Hamano
Jeff Hostetler  writes:

> I am concerned that the above compiler error message says that uintmax_t
> is defined as an "unsigned long" (which is defined as *at least* 32 bits,
> but not necessarily 64.  But a uint64_t is defined as a "unsigned long long"
> and guaranteed as a 64 bit value.

On a platform whose uintmax_t is u32, is it realistic to expect that
we would be able to use u64, even if we explicitly ask for it, in
the first place?

In other words, on a platform that handles uint64_t, I would expect
uintmax_t to be wide enough to hold an uint64_t value without
truncation.


Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Junio C Hamano
Jeff Hostetler  writes:

> I defined that routine to take a uint64_t because I wanted to
> pass a nanosecond value received from getnanotime() and that's
> what it returns.

Hmph, but the target format does not have different representation
of inttypes in different sizes, no?  

I personally doubt that we would benefit from having a group of
functions (i.e. format_int{8,16,32,64}_to_json()) that callers have
to choose from, depending on the exact size of the integer they want
to serialize.  The de-serializing side would be the same story.

Even if the variable a potential caller of the formetter is a sized
type that is different from uintmax_t, the caller shouldn't have to
add an extra cast.

Am I missing some obvious merit for having these separate functions
for explicit sizes?


Re: Git Merge contributor summit notes

2018-03-26 Thread Stefan Beller
On Mon, Mar 26, 2018 at 10:33 AM Jeff Hostetler 
wrote:



> On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Sat, Mar 10 2018, Alex Vandiver wrote:
> >
> >> New hash (Stefan, etc)
> >> --
> >>   - discussed on the mailing list
> >>   - actual plan checked in to
Documentation/technical/hash-function-transition.txt
> >>   - lots of work renaming
> >>   - any actual work with the transition plan?
> >>   - local conversion first; fetch/push have translation table
> >>   - like git-svn
> >>   - also modified pack and index format to have lookup/translation
efficiently
> >>   - brian's series to eliminate SHA1 strings from the codebase
> >>   - testsuite is not working well because hardcoded SHA1 values
> >>   - flip a bit in the sha1 computation and see what breaks in the
testsuite
> >>   - will also need a way to do the conversion itself; traverse and
write out new version
> >>   - without that, can start new repos, but not work on old ones
> >>   - on-disk formats will need to change -- something to keep in mind
with new index work
> >>   - documentation describes packfile and index formats
> >>   - what time frame are we talking?
> >>   - public perception question
> >>   - signing commits doesn't help (just signs commit object) unless you
"recursive sign"
> >>   - switched to SHA1dc; we detect and reject known collision technique
> >>   - do it now because it takes too long if we start when the collision
drops
> >>   - always call it "new hash" to reduce bikeshedding
> >>   - is translation table a backdoor? has it been reviewed by crypto
folks?
> >> - no, but everything gets translated
> >>   - meant to avoid a flag day for entire repositories
> >>   - linus can decide to upgrade to newhash; if pushes to server that
is not newhash aware, that's fine
> >>   - will need a wire protocol change
> >>   - v2 might add a capability for newhash
> >>   - "now that you mention md5, it's a good idea"
> >>   - can use md5 to test the conversion
> >>   - is there a technical reason for why not /n/ hashes?
> >>   - the slow step goes away as people converge to the new hash
> >>   - beneficial to make up some fake hash function for testing
> >>   - is there a plan on how we decide which hash function?
> >>   - trust junio to merge commits when appropriate
> >>   - conservancy committee explicitly does not make code decisions
> >>   - waiting will just give better data
> >>   - some hash functions are in silicon (e.g. microsoft cares)
> >>   - any movement in libgit2 / jgit?
> >> - basic stuff for libgit2; same testsuite problems
> >> - no work in jgit
> >>   - most optimistic forecast?
> >> - could be done in 1-2y
> >>   - submodules with one hash function?
> >> - unable to convert project unless all submodules are converted
> >> - OO-ing is not a prereq
> >
> > Late reply, but one thing I brought up at the time is that we'll want to
> > keep this code around even after the NewHash migration at least for
> > testing purposes, should we ever need to move to NewNewHash.
> >
> > It occurred to me recently that once we have such a layer it could be
> > (ab)used with some relatively minor changes to do any arbitrary
> > local-to-remote object content translation, unless I've missed something
> > (but I just re-read hash-function-transition.txt now...).
> >
> > E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
> > remote server so that you upload a GPG encrypted version of all your
> > blobs, and have your trees reference those blobs.
> >
> > Because we'd be doing arbitrary translations for all of
> > commits/trees/blobs this could go further than other bolted-on
> > encryption solutions for Git. E.g. paths in trees could be encrypted
> > too, as well as all the content of the commit object that isn't parent
> > info & the like (but that would have different hashes).
> >
> > Basically clean/smudge filters on steroids, but for every object in the
> > repo. Anyone who got a hold of it would still see the shape of the repo
> > & approximate content size, but other than that it wouldn't be more info
> > than they'd get via `fast-export --anonymize` now.
> >
> > I mainly find it interesting because presents an intersection between a
> > feature we might want to offer anyway, and something that would stress
> > the hash transition codepath going forward, to make sure it hasn't all
> > bitrotted by the time we'll need NewHash->NewNewHash.
> >
> > Git hosting providers would hate it, but they should probably be
> > charging users by how much Michael Haggerty's git-sizer tool hates their
> > repo anyway :)
> >

> While we are converting to a new hash function, it would be nice
> if we could add a couple of fields to the end of the OID:  the object
> type and the raw uncompressed object size.

This would allow to craft invalid OIDs, i.e. the correct hash value with
the wrong object type. (This is different field of "invalid" compared 

Re: [PATCH 00/27] sb/object-store updates

2018-03-26 Thread Stefan Beller
On Fri, Mar 23, 2018 at 10:31 PM Duy Nguyen  wrote:

> I got too excited when searching and replacing. Here's the fixup
> patch.

Thanks for picking up the series that I left unattended over the last weeks.
I have reviewed nd/remove-ignore-env-field as well as sb/object-store
as currently queued and found them well done.

Thanks for driving this when I was away!

With the fixup patch, both series are
Reviewed-by: Stefan Beller 
(No need to apply that as it already looks funny to have
3-5 sign offs including mine)

With these 2 series reviewed, I'll continue reviewing
sb/pack-files-in-repository and the rest of my mail box.

Thanks,
Stefan

> -- 8< --
> Subject: [PATCH] fixup! object-store: move packed_git and packed_git_mru
to
>   object store

> ---
>   packfile.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

> diff --git a/packfile.c b/packfile.c
> index 63c89ee31a..0906b8f741 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1937,7 +1937,7 @@ static int add_promisor_object(const struct
object_id *oid,

>  /*
>   * If this is a tree, commit, or tag, the objects it refers
> -* to are also promisor objects-> (Blobs refer to no objects->)
> +* to are also promisor objects. (Blobs refer to no objects.)
>   */
>  if (obj->type == OBJ_TREE) {
>  struct tree *tree = (struct tree *)obj;
> --
> 2.17.0.rc0.348.gd5a49e0b6f

> -- 8< --


Re: [RFC PATCH 1/1] json-writer: incorrect format specifier

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 1:04 PM, Junio C Hamano wrote:

Ramsay Jones  writes:


@@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char 
*key, uint64_t value)
maybe_add_comma(jw);
  
  	append_quoted_string(>json, key);

-   strbuf_addf(>json, ":%"PRIuMAX, value);
+   strbuf_addf(>json, ":%"PRIu64, value);


In this code-base, that would normally be written as:

strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t) value);


heh, I should learn not to reply in a hurry, just before
going out ...

I had not noticed that 'value' was declared with an 'sized type'
of uint64_t, so using PRIu64 should be fine.


But why is this codepath using a sized type in the first place?  It
is not like it wants to read/write a fixed binary file format---it
just wants to use an integer type that is wide enough to handle any
inttype the platform uses, for which uintmax_t would be a more
appropriate type, no?



[Somehow the conversation forked and this compiler warning
appeared in both the json-writer and the rebase-interactive
threads.  I'm copying here the response that I already made
on the latter.]


I defined that routine to take a uint64_t because I wanted to
pass a nanosecond value received from getnanotime() and that's
what it returns.

My preference would be to change the PRIuMAX to PRIu64, but there
aren't any other references in the code to that symbol and I didn't
want to start a new trend here.

I am concerned that the above compiler error message says that uintmax_t
is defined as an "unsigned long" (which is defined as *at least* 32 bits,
but not necessarily 64.  But a uint64_t is defined as a "unsigned long long"
and guaranteed as a 64 bit value.

So while I'm not really worried about 128 bit integers right now, I'm
more concerned about 32 bit compilers truncating that value without any
warnings.

Jeff


Re: [RFC PATCH v2 1/1] json-writer: add cast to uintmax_t

2018-03-26 Thread Jeff Hostetler



On 3/24/2018 2:38 PM, Wink Saville wrote:

Correct a compile error on Mac OSX by adding a cast to uintmax_t
in calls to strbuf_addf.

Helped-by: Ramsay Jones 
Tested-by: travis-ci
Signed-off-by: Wink Saville 
---
  json-writer.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/json-writer.c b/json-writer.c
index 89a6abb57..1f40482ff 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char 
*key, uint64_t value)
maybe_add_comma(jw);
  
  	append_quoted_string(>json, key);

-   strbuf_addf(>json, ":%"PRIuMAX, value);
+   strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t)value);
  }
  
  void jw_object_double(struct json_writer *jw, const char *fmt,

@@ -225,7 +225,7 @@ void jw_array_uint64(struct json_writer *jw, uint64_t value)
assert_in_array(jw);
maybe_add_comma(jw);
  
-	strbuf_addf(>json, "%"PRIuMAX, value);

+   strbuf_addf(>json, "%"PRIuMAX, (uintmax_t)value);
  }
  
  void jw_array_double(struct json_writer *jw, const char *fmt, double value)




FYI.  I included and squashed this change into V4 of my json-writer series.

Jeff



Re: [PATCH v3 2/3] fast-import: introduce mem_pool type

2018-03-26 Thread Eric Sunshine
On Mon, Mar 26, 2018 at 1:03 PM, Jameson Miller  wrote:
> Introduce the mem_pool type which encapsulates all the information
> necessary to manage a pool of memory.This change moves the existing

s/memory\.This/memory. This/

> variables in fast-import used to support the global memory pool to use
> this structure.
>
> These changes allow for the multiple instances of a memory pool to
> exist and be reused outside of fast-import. In a future commit the
> mem_pool type will be moved to its own file.
>
> Signed-off-by: Jameson Miller 


Re: Git Merge contributor summit notes

2018-03-26 Thread Jeff Hostetler



On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote:


On Sat, Mar 10 2018, Alex Vandiver wrote:


New hash (Stefan, etc)
--
  - discussed on the mailing list
  - actual plan checked in to 
Documentation/technical/hash-function-transition.txt
  - lots of work renaming
  - any actual work with the transition plan?
  - local conversion first; fetch/push have translation table
  - like git-svn
  - also modified pack and index format to have lookup/translation efficiently
  - brian's series to eliminate SHA1 strings from the codebase
  - testsuite is not working well because hardcoded SHA1 values
  - flip a bit in the sha1 computation and see what breaks in the testsuite
  - will also need a way to do the conversion itself; traverse and write out 
new version
  - without that, can start new repos, but not work on old ones
  - on-disk formats will need to change -- something to keep in mind with new 
index work
  - documentation describes packfile and index formats
  - what time frame are we talking?
  - public perception question
  - signing commits doesn't help (just signs commit object) unless you "recursive 
sign"
  - switched to SHA1dc; we detect and reject known collision technique
  - do it now because it takes too long if we start when the collision drops
  - always call it "new hash" to reduce bikeshedding
  - is translation table a backdoor? has it been reviewed by crypto folks?
- no, but everything gets translated
  - meant to avoid a flag day for entire repositories
  - linus can decide to upgrade to newhash; if pushes to server that is not 
newhash aware, that's fine
  - will need a wire protocol change
  - v2 might add a capability for newhash
  - "now that you mention md5, it's a good idea"
  - can use md5 to test the conversion
  - is there a technical reason for why not /n/ hashes?
  - the slow step goes away as people converge to the new hash
  - beneficial to make up some fake hash function for testing
  - is there a plan on how we decide which hash function?
  - trust junio to merge commits when appropriate
  - conservancy committee explicitly does not make code decisions
  - waiting will just give better data
  - some hash functions are in silicon (e.g. microsoft cares)
  - any movement in libgit2 / jgit?
- basic stuff for libgit2; same testsuite problems
- no work in jgit
  - most optimistic forecast?
- could be done in 1-2y
  - submodules with one hash function?
- unable to convert project unless all submodules are converted
- OO-ing is not a prereq


Late reply, but one thing I brought up at the time is that we'll want to
keep this code around even after the NewHash migration at least for
testing purposes, should we ever need to move to NewNewHash.

It occurred to me recently that once we have such a layer it could be
(ab)used with some relatively minor changes to do any arbitrary
local-to-remote object content translation, unless I've missed something
(but I just re-read hash-function-transition.txt now...).

E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
remote server so that you upload a GPG encrypted version of all your
blobs, and have your trees reference those blobs.

Because we'd be doing arbitrary translations for all of
commits/trees/blobs this could go further than other bolted-on
encryption solutions for Git. E.g. paths in trees could be encrypted
too, as well as all the content of the commit object that isn't parent
info & the like (but that would have different hashes).

Basically clean/smudge filters on steroids, but for every object in the
repo. Anyone who got a hold of it would still see the shape of the repo
& approximate content size, but other than that it wouldn't be more info
than they'd get via `fast-export --anonymize` now.

I mainly find it interesting because presents an intersection between a
feature we might want to offer anyway, and something that would stress
the hash transition codepath going forward, to make sure it hasn't all
bitrotted by the time we'll need NewHash->NewNewHash.

Git hosting providers would hate it, but they should probably be
charging users by how much Michael Haggerty's git-sizer tool hates their
repo anyway :)



While we are converting to a new hash function, it would be nice
if we could add a couple of fields to the end of the OID:  the object
type and the raw uncompressed object size.

If would be nice if we could extend the OID to include 6 bytes of data
(4 or 8 bits for the type and the rest for the raw object size), and
just say that an OID is a {hash,type,size} tuple.

There are lots of places where we open an object to see what type it is
or how big it is.  This requires uncompressing/undeltafying the object
(or at least decoding enough to get the header).  In the case of missing
objects (partial clone or a gvfs-like projection) it requires either
dynamically fetching the object or asking an object-size-server for the
data.

All of these cases could be 

Re: [PATCH v2 01/36] t/helper: add an empty test-tool program

2018-03-26 Thread Duy Nguyen
On Mon, Mar 26, 2018 at 5:27 PM, Johannes Schindelin
 wrote:
> Hi Duy,
>
> On Sat, 24 Mar 2018, Nguyễn Thái Ngọc Duy wrote:
>
>> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
>> new file mode 100644
>> index 00..c730f718ca
>> --- /dev/null
>> +++ b/t/helper/test-tool.c
>> @@ -0,0 +1,27 @@
>> +#include "git-compat-util.h"
>> +#include "test-tool.h"
>> +
>> +struct test_cmd {
>> + const char *name;
>> + int (*main)(int argc, const char **argv);
>
> This makes the build fail on Windows, as we override `main` in
> compat/mingw.h:

Sigh.. not complaining, but I wish somebody tries to compile git with
wine (and automate it in travis). This way we could at least cover the
compilation part for all major platforms. Probably too small for a
GSoC (and making the test suite pass with wine may be too large for
GSoC)
-- 
Duy


[PATCH v3 1/3] fast-import: rename mem_pool type to mp_block

2018-03-26 Thread Jameson Miller
This is part of a patch series to extract the memory pool logic in
fast-import into a more generalized version. The existing mem_pool type
maps more closely to a "block of memory" (mp_block) in the more
generalized memory pool. This commit renames the mem_pool to mp_block to
reduce churn in future patches.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 58ef360da4..6c3215d7c3 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -209,8 +209,8 @@ struct last_object {
unsigned no_swap : 1;
 };
 
-struct mem_pool {
-   struct mem_pool *next_pool;
+struct mp_block {
+   struct mp_block *next_block;
char *next_free;
char *end;
uintmax_t space[FLEX_ARRAY]; /* more */
@@ -304,9 +304,9 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mem_pool);
+static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
 static size_t total_allocd;
-static struct mem_pool *mem_pool;
+static struct mp_block *mp_block_head;
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -636,14 +636,14 @@ static unsigned int hc_str(const char *s, size_t len)
 
 static void *pool_alloc(size_t len)
 {
-   struct mem_pool *p;
+   struct mp_block *p;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool; p; p = p->next_pool)
+   for (p = mp_block_head; p; p = p->next_block)
if ((p->end - p->next_free >= len))
break;
 
@@ -652,12 +652,12 @@ static void *pool_alloc(size_t len)
total_allocd += len;
return xmalloc(len);
}
-   total_allocd += sizeof(struct mem_pool) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mem_pool), mem_pool_alloc));
-   p->next_pool = mem_pool;
+   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
+   p->next_block = mp_block_head;
p->next_free = (char *) p->space;
p->end = p->next_free + mem_pool_alloc;
-   mem_pool = p;
+   mp_block_head = p;
}
 
r = p->next_free;
-- 
2.14.3



[PATCH v3 2/3] fast-import: introduce mem_pool type

2018-03-26 Thread Jameson Miller
Introduce the mem_pool type which encapsulates all the information
necessary to manage a pool of memory.This change moves the existing
variables in fast-import used to support the global memory pool to use
this structure.

These changes allow for the multiple instances of a memory pool to
exist and be reused outside of fast-import. In a future commit the
mem_pool type will be moved to its own file.

Signed-off-by: Jameson Miller 
---
 fast-import.c | 80 +--
 1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 6c3215d7c3..e85512191c 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -216,6 +216,19 @@ struct mp_block {
uintmax_t space[FLEX_ARRAY]; /* more */
 };
 
+struct mem_pool {
+   struct mp_block *mp_block;
+
+   /*
+* The amount of available memory to grow the pool by.
+* This size does not include the overhead for the mp_block.
+*/
+   size_t block_alloc;
+
+   /* The total amount of memory allocated by the pool. */
+   size_t pool_alloc;
+};
+
 struct atom_str {
struct atom_str *next_atom;
unsigned short str_len;
@@ -304,9 +317,7 @@ static int global_argc;
 static const char **global_argv;
 
 /* Memory pools */
-static size_t mem_pool_alloc = 2*1024*1024 - sizeof(struct mp_block);
-static size_t total_allocd;
-static struct mp_block *mp_block_head;
+static struct mem_pool fi_mem_pool =  {0, 2*1024*1024 - sizeof(struct 
mp_block), 0 };
 
 /* Atom management */
 static unsigned int atom_table_sz = 4451;
@@ -324,6 +335,7 @@ static off_t pack_size;
 /* Table of objects we've written. */
 static unsigned int object_entry_alloc = 5000;
 static struct object_entry_pool *blocks;
+static size_t total_allocd;
 static struct object_entry *object_table[1 << 16];
 static struct mark_set *marks;
 static const char *export_marks_file;
@@ -634,7 +646,21 @@ static unsigned int hc_str(const char *s, size_t len)
return r;
 }
 
-static void *pool_alloc(size_t len)
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+   p->next_block = mem_pool->mp_block;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+   mem_pool->mp_block = p;
+
+   return p;
+}
+
+static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
void *r;
@@ -643,21 +669,17 @@ static void *pool_alloc(size_t len)
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mp_block_head; p; p = p->next_block)
-   if ((p->end - p->next_free >= len))
-   break;
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if (p->end - p->next_free >= len)
+   break;
 
if (!p) {
-   if (len >= (mem_pool_alloc/2)) {
-   total_allocd += len;
+   if (len >= (mem_pool->block_alloc / 2)) {
+   mem_pool->pool_alloc += len;
return xmalloc(len);
}
-   total_allocd += sizeof(struct mp_block) + mem_pool_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), mem_pool_alloc));
-   p->next_block = mp_block_head;
-   p->next_free = (char *) p->space;
-   p->end = p->next_free + mem_pool_alloc;
-   mp_block_head = p;
+
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
}
 
r = p->next_free;
@@ -665,10 +687,10 @@ static void *pool_alloc(size_t len)
return r;
 }
 
-static void *pool_calloc(size_t count, size_t size)
+static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
 {
-   size_t len = count * size;
-   void *r = pool_alloc(len);
+   size_t len = st_mult(count, size);
+   void *r = mem_pool_alloc(mem_pool, len);
memset(r, 0, len);
return r;
 }
@@ -676,7 +698,7 @@ static void *pool_calloc(size_t count, size_t size)
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
-   char *r = pool_alloc(len);
+   char *r = mem_pool_alloc(_mem_pool, len);
memcpy(r, s, len);
return r;
 }
@@ -685,7 +707,7 @@ static void insert_mark(uintmax_t idnum, struct 
object_entry *oe)
 {
struct mark_set *s = marks;
while ((idnum >> s->shift) >= 1024) {
-   s = pool_calloc(1, sizeof(struct mark_set));
+   s = mem_pool_calloc(_mem_pool, 1, sizeof(struct mark_set));
s->shift = marks->shift + 10;
s->data.sets[0] = marks;
marks = s;
@@ 

Re: [PATCH v7 00/13] nd/pack-objects-pack-struct updates

2018-03-26 Thread Duy Nguyen
On Mon, Mar 26, 2018 at 5:13 PM, Jeff King  wrote:
> On Sat, Mar 24, 2018 at 07:33:40AM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> +unsigned long oe_get_size_slow(struct packing_data *pack,
>> +const struct object_entry *e)
>> +{
>> + struct packed_git *p;
>> + struct pack_window *w_curs;
>> + unsigned char *buf;
>> + enum object_type type;
>> + unsigned long used, avail, size;
>> +
>> + if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) {
>> + read_lock();
>> + if (sha1_object_info(e->idx.oid.hash, ) < 0)
>> + die(_("unable to get size of %s"),
>> + oid_to_hex(>idx.oid));
>> + read_unlock();
>> + return size;
>> + }
>> +
>> + p = oe_in_pack(pack, e);
>> + if (!p)
>> + die("BUG: when e->type is a delta, it must belong to a pack");
>> +
>> + read_lock();
>> + w_curs = NULL;
>> + buf = use_pack(p, _curs, e->in_pack_offset, );
>> + used = unpack_object_header_buffer(buf, avail, , );
>> + if (used == 0)
>> + die(_("unable to parse object header of %s"),
>> + oid_to_hex(>idx.oid));
>> +
>> + unuse_pack(_curs);
>> + read_unlock();
>> + return size;
>> +}
>
> It took me a while to figure out why this treated deltas and non-deltas
> differently. At first I thought it was an optimization (since we can
> find non-delta sizes quickly by looking at the headers).  But I think
> it's just that you want to know the size of the actual _delta_, not the
> reconstructed object. And there's no way to ask sha1_object_info() for
> that.
>
> Perhaps the _extended version of that function should learn an
> OBJECT_INFO_NO_DEREF flag or something to tell it return the true delta
> type and size. Then this whole function could just become a single call.
>
> But short of that, it's probably worth a comment explaining what's going
> on.

I thought the elaboration on "size" in the big comment block in front
of struct object_entry was enough. I was wrong. Will add something
here.

>> +Running tests with special setups
>> +-
>> +
>> +The whole test suite could be run to test some special features
>> +that cannot be easily covered by a few specific test cases. These
>> +could be enabled by running the test suite with correct GIT_TEST_
>> +environment set.
>> +
>> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
>> +
>> +GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
>> +path where there are more than 1024 packs even if the actual number of
>> +packs in repository is below this limit.
>> +
>> +GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
>> +code path where we do not cache objecct size in memory and read it
>> +from existing packs on demand. This normally only happens when the
>> +object size is over 2GB. This variable forces the code path on any
>> +object larger than 2^ bytes.
>
> It's nice to have these available to test the uncommon cases. But I have
> a feeling nobody will ever run them, since it requires extra effort (and
> takes a full test run).

I know :) I also know that this does not interfere with
GIT_TEST_SPLIT_INDEX, which is being run in Travis. So the plan (after
this series is merged) is to make Travis second run to do something
like

make test GIT_TEST_SPLIT...=1 GIT_TEST_FULL..=1 GIT_TEST_OE..=4

we don't waste more cpu cycles and we can make sure these code paths
are always run (at least on one platform)

> I see there's a one-off test for GIT_TEST_FULL_IN_PACK_ARRAY, which I
> think is a good idea, since it makes sure the code is exercised in a
> normal test suite run. Should we do the same for GIT_TEST_OE_SIZE_BITS?

I think the problem with OE_SIZE_BITS is it has many different code
paths (like reused deltas) which is hard to make sure it runs. But yes
I think I could construct a pack that executes both code paths in
oe_get_size_slow(). Will do in a reroll.

> I haven't done an in-depth read of each patch yet; this was just what
> jumped out at me from reading the interdiff.

I would really appreciate it if you could find some time to do it. The
bugs I found in this round proved that I had no idea what's really
going on in pack-objects. Sure I know the big picture but that's far
from enough to do changes like this.
-- 
Duy


[PATCH v3 0/3] Extract memory pool logic into reusable component

2018-03-26 Thread Jameson Miller
Changes from v2:

  - Code Review Reactions
  
  - Lifetime management functions for mem_pool will be included in
future patch series

This patch series extracts the memory pool implementation, currently
used by fast-import, into a generalized component. This memory pool
can then be generally used by any component that needs a pool of
memory.

This patch is in preparation for a change to speed up the loading of
indexes with a large number of cache entries by reducing the number of
malloc() calls. For a rough example of how this component will be used
in this capacity, please see [1].

[1] 
https://github.com/jamill/git/compare/block_allocation_base...jamill:block_allocation

Jameson Miller (3):
  fast-import: rename mem_pool type to mp_block
  fast-import: introduce mem_pool type
  Move reusable parts of memory pool into its own file

 Makefile  |  1 +
 fast-import.c | 74 +++
 mem-pool.c| 55 
 mem-pool.h| 34 +++
 4 files changed, 104 insertions(+), 60 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

-- 
2.14.3



Re: [RFC PATCH 1/1] json-writer: incorrect format specifier

2018-03-26 Thread Junio C Hamano
Ramsay Jones  writes:

>>> @@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const 
>>> char *key, uint64_t value)
>>> maybe_add_comma(jw);
>>>  
>>> append_quoted_string(>json, key);
>>> -   strbuf_addf(>json, ":%"PRIuMAX, value);
>>> +   strbuf_addf(>json, ":%"PRIu64, value);
>> 
>> In this code-base, that would normally be written as:
>> 
>>  strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t) value);
>
> heh, I should learn not to reply in a hurry, just before
> going out ...
>
> I had not noticed that 'value' was declared with an 'sized type'
> of uint64_t, so using PRIu64 should be fine.

But why is this codepath using a sized type in the first place?  It
is not like it wants to read/write a fixed binary file format---it
just wants to use an integer type that is wide enough to handle any
inttype the platform uses, for which uintmax_t would be a more
appropriate type, no?


[PATCH v3 3/3] Move reusable parts of memory pool into its own file

2018-03-26 Thread Jameson Miller
This moves the reusable parts of the memory pool logic used by
fast-import.c into its own file for use by other components.

Signed-off-by: Jameson Miller 
---
 Makefile  |  1 +
 fast-import.c | 70 +--
 mem-pool.c| 55 ++
 mem-pool.h| 34 +
 4 files changed, 91 insertions(+), 69 deletions(-)
 create mode 100644 mem-pool.c
 create mode 100644 mem-pool.h

diff --git a/Makefile b/Makefile
index a1d8775adb..1e142b1dd9 100644
--- a/Makefile
+++ b/Makefile
@@ -832,6 +832,7 @@ LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
+LIB_OBJS += mem-pool.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
diff --git a/fast-import.c b/fast-import.c
index e85512191c..12c0058c06 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -168,6 +168,7 @@ Format of STDIN stream:
 #include "dir.h"
 #include "run-command.h"
 #include "packfile.h"
+#include "mem-pool.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<pool_alloc += sizeof(struct mp_block) + block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
-   p->next_block = mem_pool->mp_block;
-   p->next_free = (char *)p->space;
-   p->end = p->next_free + block_alloc;
-   mem_pool->mp_block = p;
-
-   return p;
-}
-
-static void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
-{
-   struct mp_block *p;
-   void *r;
-
-   /* round up to a 'uintmax_t' alignment */
-   if (len & (sizeof(uintmax_t) - 1))
-   len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
-
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
-
-   if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
-
-   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
-   }
-
-   r = p->next_free;
-   p->next_free += len;
-   return r;
-}
-
-static void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t 
size)
-{
-   size_t len = st_mult(count, size);
-   void *r = mem_pool_alloc(mem_pool, len);
-   memset(r, 0, len);
-   return r;
-}
-
 static char *pool_strdup(const char *s)
 {
size_t len = strlen(s) + 1;
diff --git a/mem-pool.c b/mem-pool.c
new file mode 100644
index 00..389d7af447
--- /dev/null
+++ b/mem-pool.c
@@ -0,0 +1,55 @@
+/*
+ * Memory Pool implementation logic.
+ */
+
+#include "cache.h"
+#include "mem-pool.h"
+
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   struct mp_block *p;
+
+   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
+   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
+   p->next_block = mem_pool->mp_block;
+   p->next_free = (char *)p->space;
+   p->end = p->next_free + block_alloc;
+   mem_pool->mp_block = p;
+
+   return p;
+}
+
+void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
+{
+   struct mp_block *p;
+   void *r;
+
+   /* round up to a 'uintmax_t' alignment */
+   if (len & (sizeof(uintmax_t) - 1))
+   len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
+
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if (p->end - p->next_free >= len)
+   break;
+
+   if (!p) {
+   if (len >= (mem_pool->block_alloc / 2)) {
+   mem_pool->pool_alloc += len;
+   return xmalloc(len);
+   }
+
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
+   }
+
+   r = p->next_free;
+   p->next_free += len;
+   return r;
+}
+
+void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t size)

Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 11:56 AM, Junio C Hamano wrote:

Wink Saville  writes:


json-writer.c:123:38:  error:  format specifies type 'uintmax_t' (aka
'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned
long long') [-Werror,-Wformat]

 strbuf_addf(>json, ":%"PRIuMAX, value);
  ~~ ^
json-writer.c:228:37:  error:  format specifies type 'uintmax_t' (aka
'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned
long long') [-Werror,-Wformat] [0m

 strbuf_addf(>json, "%"PRIuMAX, value);
  ~~ ^
2 errors generated.
make: *** [json-writer.o] Error 1
make: *** Waiting for unfinished jobs


For whatever reason, our codebase seems to shy away from PRIu64,
even though there are liberal uses of PRIu32.  Showing the value
casted to uintmax_t with PRIuMAX seems to be our preferred way to
say "We cannot say how wide this type is on different platforms, and
are playing safe by using widest-possible int type" (e.g. showing a
pid_t value from daemon.c).

In this codepath, the actual values are specified to be uint64_t, so
the use of PRIu64 may be OK, but I have to wonder why the codepath
is not dealing with uintmax_t in the first place.  When even larger
than present archs are prevalent in N years and 64-bit starts to
feel a tad small (like we feel for 16-bit ints these days), it will
feel a bit silly to have a subsystem that is limited to such a
"fixed and a tad small these days" types and pretend it to be be a
generic seriealizer, I suspect.



I defined that routine to take a uint64_t because I wanted to
pass a nanosecond value received from getnanotime() and that's
what it returns.

My preference would be to change the PRIuMAX to PRIu64, but there
aren't any other references in the code to that symbol and I didn't
want to start a new trend here.

I am concerned that the above compiler error message says that uintmax_t
is defined as an "unsigned long" (which is defined as *at least* 32 bits,
but not necessarily 64.  But a uint64_t is defined as a "unsigned long long"
and guaranteed as a 64 bit value.

So while I'm not really worried about 128 bit integers right now, I'm
more concerned about 32 bit compilers truncating that value without any
warnings.

Jeff



[PATCH/RFC 4/5] git.c: implement --list-cmds=porcelain

2018-03-26 Thread Nguyễn Thái Ngọc Duy
This is useful for git-completion.bash because it needs this set of
commands. Right now we have to maintain a separate command category in
there.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 contrib/completion/git-completion.bash | 94 ++
 git.c  |  2 +
 help.c | 12 
 help.h |  1 +
 t/t9902-completion.sh  |  4 +-
 5 files changed, 20 insertions(+), 93 deletions(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index e0f545819d..d711a9b53a 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -833,14 +833,15 @@ __git_commands () {
then
printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}"
else
-   git --list-cmds=all
+   git --list-cmds=$1
fi
 }
 
 __git_list_all_commands ()
 {
local i IFS=" "$'\n'
-   for i in $(__git_commands)
+   local category=${1-all}
+   for i in $(__git_commands $category)
do
case $i in
*--*) : helper pattern;;
@@ -856,98 +857,11 @@ __git_compute_all_commands ()
__git_all_commands=$(__git_list_all_commands)
 }
 
-__git_list_porcelain_commands ()
-{
-   local i IFS=" "$'\n'
-   __git_compute_all_commands
-   for i in $__git_all_commands
-   do
-   case $i in
-   *--*) : helper pattern;;
-   applymbox): ask gittus;;
-   applypatch)   : ask gittus;;
-   archimport)   : import;;
-   cat-file) : plumbing;;
-   check-attr)   : plumbing;;
-   check-ignore) : plumbing;;
-   check-mailmap): plumbing;;
-   check-ref-format) : plumbing;;
-   checkout-index)   : plumbing;;
-   column)   : internal helper;;
-   commit-tree)  : plumbing;;
-   count-objects): infrequent;;
-   credential)   : credentials;;
-   credential-*) : credentials helper;;
-   cvsexportcommit)  : export;;
-   cvsimport): import;;
-   cvsserver): daemon;;
-   daemon)   : daemon;;
-   diff-files)   : plumbing;;
-   diff-index)   : plumbing;;
-   diff-tree): plumbing;;
-   fast-import)  : import;;
-   fast-export)  : export;;
-   fsck-objects) : plumbing;;
-   fetch-pack)   : plumbing;;
-   fmt-merge-msg): plumbing;;
-   for-each-ref) : plumbing;;
-   hash-object)  : plumbing;;
-   http-*)   : transport;;
-   index-pack)   : plumbing;;
-   init-db)  : deprecated;;
-   local-fetch)  : plumbing;;
-   ls-files) : plumbing;;
-   ls-remote): plumbing;;
-   ls-tree)  : plumbing;;
-   mailinfo) : plumbing;;
-   mailsplit): plumbing;;
-   merge-*)  : plumbing;;
-   mktree)   : plumbing;;
-   mktag): plumbing;;
-   pack-objects) : plumbing;;
-   pack-redundant)   : plumbing;;
-   pack-refs): plumbing;;
-   parse-remote) : plumbing;;
-   patch-id) : plumbing;;
-   prune): plumbing;;
-   prune-packed) : plumbing;;
-   quiltimport)  : import;;
-   read-tree): plumbing;;
-   receive-pack) : plumbing;;
-   remote-*) : transport;;
-   rerere)   : plumbing;;
-   rev-list) : plumbing;;
-   rev-parse): plumbing;;
-   runstatus): plumbing;;
-   sh-setup) : internal;;
-   shell): daemon;;
-   show-ref) : plumbing;;
-   send-pack): plumbing;;
-   show-index)   : plumbing;;
-   ssh-*): transport;;
-   stripspace)   : plumbing;;
-   symbolic-ref) : plumbing;;
-   unpack-file)  : plumbing;;
-   unpack-objects)   : plumbing;;
-   update-index) : plumbing;;
-   update-ref)   : plumbing;;
-   update-server-info) : daemon;;
-   upload-archive)   : plumbing;;
-   upload-pack)  : plumbing;;
-   write-tree)   : plumbing;;
-   var)  : infrequent;;
-   verify-pack)  

[PATCH/RFC 5/5] help: add "-a --verbose" to list all commands with synopsis

2018-03-26 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/git-help.txt |  4 ++-
 builtin/help.c |  6 
 help.c | 61 ++
 help.h |  1 +
 4 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-help.txt b/Documentation/git-help.txt
index 40d328a4b3..a40fc38d8b 100644
--- a/Documentation/git-help.txt
+++ b/Documentation/git-help.txt
@@ -8,7 +8,7 @@ git-help - Display help information about Git
 SYNOPSIS
 
 [verse]
-'git help' [-a|--all] [-g|--guide]
+'git help' [-a|--all [--verbose]] [-g|--guide]
   [-i|--info|-m|--man|-w|--web] [COMMAND|GUIDE]
 
 DESCRIPTION
@@ -42,6 +42,8 @@ OPTIONS
 --all::
Prints all the available commands on the standard output. This
option overrides any given command or guide name.
+   When used with `--verbose` print description for all recognized
+   commands.
 
 -g::
 --guides::
diff --git a/builtin/help.c b/builtin/help.c
index 598867cfea..a371199674 100644
--- a/builtin/help.c
+++ b/builtin/help.c
@@ -36,6 +36,7 @@ static const char *html_path;
 
 static int show_all = 0;
 static int show_guides = 0;
+static int verbose;
 static unsigned int colopts;
 static enum help_format help_format = HELP_FORMAT_NONE;
 static int exclude_guides;
@@ -48,6 +49,7 @@ static struct option builtin_help_options[] = {
HELP_FORMAT_WEB),
OPT_SET_INT('i', "info", _format, N_("show info page"),
HELP_FORMAT_INFO),
+   OPT__VERBOSE(, N_("print command description")),
OPT_END(),
 };
 
@@ -463,6 +465,10 @@ int cmd_help(int argc, const char **argv, const char 
*prefix)
 
if (show_all) {
git_config(git_help_config, NULL);
+   if (verbose) {
+   list_all_cmds_help();
+   return 0;
+   }
printf(_("usage: %s%s"), _(git_usage_string), "\n\n");
load_command_list("git-", _cmds, _cmds);
list_commands(colopts, _cmds, _cmds);
diff --git a/help.c b/help.c
index cacd8249bb..afbf98c241 100644
--- a/help.c
+++ b/help.c
@@ -282,6 +282,67 @@ void list_porcelain_cmds(void)
}
 }
 
+static int cmd_category_cmp(const void *elem1, const void *elem2)
+{
+   const struct cmdname_help *e1 = elem1;
+   const struct cmdname_help *e2 = elem2;
+
+   if (e1->category < e2->category)
+   return -1;
+   if (e1->category > e2->category)
+   return 1;
+   return strcmp(e1->name, e2->name);
+}
+
+static const char *get_category_name(unsigned int category)
+{
+   switch (category) {
+   case CAT_ancillaryinterrogators: return _("Ancillary interrogators");
+   case CAT_ancillarymanipulators: return _("Ancillary manipulators");
+   case CAT_foreignscminterface: return _("Foreign SCM interface");
+   case CAT_mainporcelain: return _("Main porcelain");
+   case CAT_plumbinginterrogators: return _("Plumbing interrogators");
+   case CAT_plumbingmanipulators: return _("Plumbing interrogators");
+   case CAT_purehelpers: return _("Pure helpers");
+   case CAT_synchelpers: return _("Sync helpers");
+   case CAT_synchingrepositories: return _("Synching repositories");
+   default:
+   die("BUG: unknown command category %u", category);
+   }
+}
+
+void list_all_cmds_help(void)
+{
+   int i, longest = 0;
+   int current_category = -1;
+   int nr = ARRAY_SIZE(command_list);
+   struct cmdname_help *cmds = command_list;
+
+   for (i = 0; i < nr; i++) {
+   struct cmdname_help *cmd = cmds + i;
+
+   if (longest < strlen(cmd->name))
+   longest = strlen(cmd->name);
+   }
+
+   QSORT(cmds, nr, cmd_category_cmp);
+
+   puts(_("These are all Git commands:"));
+
+   for (i = 0; i < nr; i++) {
+   struct cmdname_help *cmd = cmds + i;
+
+   if (cmd->category != current_category) {
+   current_category = cmd->category;
+   printf("\n%s:\n", get_category_name(current_category));
+   }
+
+   printf("   %s   ", cmd->name);
+   mput_char(' ', longest - strlen(cmd->name));
+   puts(_(cmd->help));
+   }
+}
+
 int is_in_cmdlist(struct cmdnames *c, const char *s)
 {
int i;
diff --git a/help.h b/help.h
index 33e2210ebd..62449f1b7e 100644
--- a/help.h
+++ b/help.h
@@ -17,6 +17,7 @@ static inline void mput_char(char c, unsigned int num)
 }
 
 extern void list_common_cmds_help(void);
+extern void list_all_cmds_help(void);
 extern void list_all_cmds(void);
 extern void list_porcelain_cmds(void);
 extern const char *help_unknown_cmd(const char *cmd);
-- 
2.17.0.rc0.348.gd5a49e0b6f



[PATCH/RFC 3/5] generate-cmdlist.sh: keep all information in common-cmds.h

2018-03-26 Thread Nguyễn Thái Ngọc Duy
common-cmds.h is used to extract the list of common commands (by
group) and a one-line summary of each command. Some information is
dropped, for example command category or summary of other commands.
Update generate-cmdlist.sh to keep all the information. The extra info
will be used shortly.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 generate-cmdlist.sh | 46 ++---
 help.c  | 43 --
 2 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index eeea4b67ea..72235e7296 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -2,9 +2,10 @@
 
 echo "/* Automatically generated by generate-cmdlist.sh */
 struct cmdname_help {
-   char name[16];
+   char name[32];
char help[80];
-   unsigned char group;
+   unsigned int category;
+   unsigned int group;
 };
 
 static const char *common_cmd_groups[] = {"
@@ -23,27 +24,50 @@ sed -n '
' "$1"
 printf '};\n\n'
 
+echo "#define GROUP_NONE 0xff /* no common group */"
+echo "#define GROUP_ 0xff /* no common group */"
 n=0
-substnum=
 while read grp
 do
-   echo "^git-..*[ ]$grp"
-   substnum="$substnum${substnum:+;}s/[]$grp/$n/"
+   echo "#define GROUP_$grp $n"
n=$(($n+1))
-done <"$grps" >"$match"
+done <"$grps"
+echo
 
-printf 'static struct cmdname_help common_cmds[] = {\n'
-grep -f "$match" "$1" |
+echo '/*'
+printf 'static const char *cmd_categories[] = {\n'
+grep '^git-' "$1" |
+awk '{print $2;}' |
+sort |
+uniq |
+while read category; do
+   printf '\t\"'$category'\",\n'
+done
+printf '\tNULL\n};\n\n'
+echo '*/'
+
+n=0
+grep '^git-' "$1" |
+awk '{print $2;}' |
+sort |
+uniq |
+while read category; do
+   echo "#define CAT_$category $n"
+   n=$(($n+1))
+done
+echo
+
+printf 'static struct cmdname_help command_list[] = {\n'
+grep "^git-" "$1" |
 sed 's/^git-//' |
 sort |
-while read cmd tags
+while read cmd category tags
 do
-   tag=$(echo "$tags" | sed "$substnum; s/[^0-9]//g")
sed -n '
/^NAME/,/git-'"$cmd"'/H
${
x
-   s/.*git-'"$cmd"' - \(.*\)/  {"'"$cmd"'", N_("\1"), 
'$tag'},/
+   s/.*git-'"$cmd"' - \(.*\)/  {"'"$cmd"'", N_("\1"), 
CAT_'$category', GROUP_'$tags' },/
p
}' "Documentation/git-$cmd.txt"
 done
diff --git a/help.c b/help.c
index f3f35dfbb1..4d07ea3913 100644
--- a/help.c
+++ b/help.c
@@ -190,6 +190,28 @@ void list_commands(unsigned int colopts,
}
 }
 
+static void extract_common_cmds(struct cmdname_help **p_common_cmds,
+   int *p_nr)
+{
+   int i, nr = 0;
+   struct cmdname_help *common_cmds;
+
+   ALLOC_ARRAY(common_cmds, ARRAY_SIZE(command_list));
+
+   for (i = 0; i < ARRAY_SIZE(command_list); i++) {
+   const struct cmdname_help *cmd = command_list + i;
+
+   if (cmd->category != CAT_mainporcelain ||
+   cmd->group == GROUP_NONE)
+   continue;
+
+   common_cmds[nr++] = *cmd;
+   }
+
+   *p_common_cmds = common_cmds;
+   *p_nr = nr;
+}
+
 static int cmd_group_cmp(const void *elem1, const void *elem2)
 {
const struct cmdname_help *e1 = elem1;
@@ -206,17 +228,21 @@ void list_common_cmds_help(void)
 {
int i, longest = 0;
int current_grp = -1;
+   int nr = 0;
+   struct cmdname_help *common_cmds;
+
+   extract_common_cmds(_cmds, );
 
-   for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
+   for (i = 0; i < nr; i++) {
if (longest < strlen(common_cmds[i].name))
longest = strlen(common_cmds[i].name);
}
 
-   QSORT(common_cmds, ARRAY_SIZE(common_cmds), cmd_group_cmp);
+   QSORT(common_cmds, nr, cmd_group_cmp);
 
puts(_("These are common Git commands used in various situations:"));
 
-   for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
+   for (i = 0; i < nr; i++) {
if (common_cmds[i].group != current_grp) {
printf("\n%s\n", 
_(common_cmd_groups[common_cmds[i].group]));
current_grp = common_cmds[i].group;
@@ -226,6 +252,7 @@ void list_common_cmds_help(void)
mput_char(' ', longest - strlen(common_cmds[i].name));
puts(_(common_cmds[i].help));
}
+   free(common_cmds);
 }
 
 void list_all_cmds(void)
@@ -298,8 +325,9 @@ static const char bad_interpreter_advice[] =
 
 const char *help_unknown_cmd(const char *cmd)
 {
-   int i, n, best_similarity = 0;
+   int i, n, best_similarity = 0, nr_common;
struct cmdnames main_cmds, other_cmds;
+   struct cmdname_help *common_cmds;
 
memset(_cmds, 0, sizeof(main_cmds));
memset(_cmds, 0, sizeof(other_cmds));
@@ -314,6 +342,8 @@ const char 

[PATCH/RFC 1/5] git.c: convert --list-builtins to --list-cmds=builtins

2018-03-26 Thread Nguyễn Thái Ngọc Duy
Even if this is a hidden option, let's make it a bit more generic
since we're introducing more listing types.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 git.c   | 7 +--
 t/t0012-help.sh | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/git.c b/git.c
index ceaa58ef40..f350002260 100644
--- a/git.c
+++ b/git.c
@@ -205,8 +205,11 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
}
(*argv)++;
(*argc)--;
-   } else if (!strcmp(cmd, "--list-builtins")) {
-   list_builtins();
+   } else if (skip_prefix(cmd, "--list-cmds=", )) {
+   if (!strcmp(cmd, "builtins"))
+   list_builtins();
+   else
+   die("unsupported command listing type '%s'", 
cmd);
exit(0);
} else {
fprintf(stderr, _("unknown option: %s\n"), cmd);
diff --git a/t/t0012-help.sh b/t/t0012-help.sh
index 487b92a5de..fd2a7f27dc 100755
--- a/t/t0012-help.sh
+++ b/t/t0012-help.sh
@@ -50,7 +50,7 @@ test_expect_success "--help does not work for guides" "
 "
 
 test_expect_success 'generate builtin list' '
-   git --list-builtins >builtins
+   git --list-cmds=builtins >builtins
 '
 
 while read builtin
-- 
2.17.0.rc0.348.gd5a49e0b6f



[PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-03-26 Thread Nguyễn Thái Ngọc Duy
This is pretty rough but I'd like to see how people feel about this
first.

I notice we have two places for command classification. One in
command-list.txt, one in __git_list_porcelain_commands() in
git-completion.bash. People who are following nd/parseopt-completion
probably know that I'm try to reduce duplication in this script as
much as possible, this is another step towards that.

By keeping all information of command-list.txt in git binary, we could
provide the porcelain list to git-completion.bash via "git
--list-cmds=porcelain", so we don't neeed a separate command
classification in git-completion.bash anymore.

Because we have all command synopsis as a side effect, we could
now support "git help -a --verbose" which prints something like "git
help", a command name and a description, but we could do it for _all_
recognized commands. This could help people look for a command even if
we don't provide "git appropos".

PS. Elsewhere I introduced --list-builtin-cmds, which should become
--list-cmds=builtin if this series seems like a good idea to move
forward.

Nguyễn Thái Ngọc Duy (5):
  git.c: convert --list-builtins to --list-cmds=builtins
  git.c: implement --list-cmds=all and use it in git-completion.bash
  generate-cmdlist.sh: keep all information in common-cmds.h
  git.c: implement --list-cmds=porcelain
  help: add "-a --verbose" to list all commands with synopsis

 Documentation/git-help.txt |   4 +-
 builtin/help.c |   6 ++
 contrib/completion/git-completion.bash |  94 +-
 generate-cmdlist.sh|  46 ++---
 git.c  |  11 ++-
 help.c | 131 +++--
 help.h |   3 +
 t/t0012-help.sh|   2 +-
 t/t9902-completion.sh  |   4 +-
 9 files changed, 187 insertions(+), 114 deletions(-)

-- 
2.17.0.rc0.348.gd5a49e0b6f



[PATCH/RFC 2/5] git.c: implement --list-cmds=all and use it in git-completion.bash

2018-03-26 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 contrib/completion/git-completion.bash |  2 +-
 git.c  |  2 ++
 help.c | 15 +++
 help.h |  1 +
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index c7957f0a90..e0f545819d 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -833,7 +833,7 @@ __git_commands () {
then
printf "%s" "${GIT_TESTING_COMMAND_COMPLETION}"
else
-   git help -a|egrep '^  [a-zA-Z0-9]'
+   git --list-cmds=all
fi
 }
 
diff --git a/git.c b/git.c
index f350002260..2e0c5e17e2 100644
--- a/git.c
+++ b/git.c
@@ -208,6 +208,8 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
} else if (skip_prefix(cmd, "--list-cmds=", )) {
if (!strcmp(cmd, "builtins"))
list_builtins();
+   else if (!strcmp(cmd, "all"))
+   list_all_cmds();
else
die("unsupported command listing type '%s'", 
cmd);
exit(0);
diff --git a/help.c b/help.c
index 60071a9bea..f3f35dfbb1 100644
--- a/help.c
+++ b/help.c
@@ -228,6 +228,21 @@ void list_common_cmds_help(void)
}
 }
 
+void list_all_cmds(void)
+{
+   struct cmdnames main_cmds, other_cmds;
+   int i;
+
+   memset(_cmds, 0, sizeof(main_cmds));
+   memset(_cmds, 0, sizeof(other_cmds));
+   load_command_list("git-", _cmds, _cmds);
+
+   for (i = 0; i < main_cmds.cnt; i++)
+   puts(main_cmds.names[i]->name);
+   for (i = 0; i < other_cmds.cnt; i++)
+   puts(other_cmds.names[i]->name);
+}
+
 int is_in_cmdlist(struct cmdnames *c, const char *s)
 {
int i;
diff --git a/help.h b/help.h
index b21d7c94e8..0bf29f8dc5 100644
--- a/help.h
+++ b/help.h
@@ -17,6 +17,7 @@ static inline void mput_char(char c, unsigned int num)
 }
 
 extern void list_common_cmds_help(void);
+extern void list_all_cmds(void);
 extern const char *help_unknown_cmd(const char *cmd);
 extern void load_command_list(const char *prefix,
  struct cmdnames *main_cmds,
-- 
2.17.0.rc0.348.gd5a49e0b6f



Re: [PATCH v2] branch: implement shortcut to delete last branch

2018-03-26 Thread Junio C Hamano
g...@matthieu-moy.fr writes:

>> That said, I'd still be OK with it.
>
> I don't have objection either.

FWIW, I do not even buy the "destructive commands should force
spelling things out even more" argument in the first place.

$ git checkout somelongtopicname
$ work work work
$ git checkout master && git merge -
$ git branch -d -

would be a lot less error-prone than the user being forced to write
last step in longhand

$ git branch -d someotherlongtopicname

and destroying an unrelated but similarly named branch.

So obviously I am OK with it, too.

As long as we do not regress end-user experience, that is.  For
example, "git merge @{-1}" in the above sequence would record the
fact that the resulting commit is a merge of 'somelongtopicname',
not literally "@{-1}", in its log message.  It would be a sad
regression if it suddenly starts to say "Merge branch '-'" [*1*],
for example.


[Reference]

*1* https://public-inbox.org/git/xmqqinnsegxb@gitster.mtv.corp.google.com/




Re: [ANNOUNCE] Git v2.17.0-rc1

2018-03-26 Thread Johannes Schindelin
Hi Bryan,

On Fri, 23 Mar 2018, Bryan Turner wrote:

> On Fri, Mar 23, 2018 at 10:47 AM, Johannes Schindelin
>  wrote:
> >
> > On Wed, 21 Mar 2018, Junio C Hamano wrote:
> >
> >> A release candidate Git v2.17.0-rc1 is now available for testing
> >> at the usual places.  It is comprised of 493 non-merge commits
> >> since v2.16.0, contributed by 62 people, 19 of which are new faces.
> >>
> >> The tarballs are found at:
> >>
> >> 
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_pub_software_scm_git_testing_=DwIBAg=wBUwXtM9sKhff6UeHOQgvw=uBedA6EFFVX1HiLgmpdrBrv8bIDAScKjk1yk9LOASBM=yXNBIWf9n-gxAIgQyCzXfuKaFkHQaMmwUdtiNBNE8XI=E_Z2M418iwz-HyJg5D0VyTCvyMMd4kGIvYccgJkyTwA=
> >
> > And Git for Windows v2.17.0-rc1 can be found here:
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_git-2Dfor-2Dwindows_git_releases_tag_v2.17.0-2Drc1.windows.1=DwIBAg=wBUwXtM9sKhff6UeHOQgvw=uBedA6EFFVX1HiLgmpdrBrv8bIDAScKjk1yk9LOASBM=yXNBIWf9n-gxAIgQyCzXfuKaFkHQaMmwUdtiNBNE8XI=7ePu15Fwlwuxo8JGcqj-pBNh1wSZYAfYmboqBvJOyA0=
> >
> > Please test so that we can hammer out a robust v2.17.0!
> 
> I've added 2.16.3 and 2.17.0-rc1, for both Linux and Windows, to the
> test matrix for Bitbucket Server. All ~1500 tests have passed for all
> 4 versions.

Thank you so much for testing!

Everybody else: remember that I can only fix bugs pre-emptively in time
for v2.17.0 if you test and report...

Ciao,
Johannes


Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Junio C Hamano
Wink Saville  writes:

> json-writer.c:123:38:  error:  format specifies type 'uintmax_t' (aka
> 'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned
> long long') [-Werror,-Wformat]
>
> strbuf_addf(>json, ":%"PRIuMAX, value);
>  ~~ ^
> json-writer.c:228:37:  error:  format specifies type 'uintmax_t' (aka
> 'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned
> long long') [-Werror,-Wformat] [0m
>
> strbuf_addf(>json, "%"PRIuMAX, value);
>  ~~ ^
> 2 errors generated.
> make: *** [json-writer.o] Error 1
> make: *** Waiting for unfinished jobs

For whatever reason, our codebase seems to shy away from PRIu64,
even though there are liberal uses of PRIu32.  Showing the value
casted to uintmax_t with PRIuMAX seems to be our preferred way to
say "We cannot say how wide this type is on different platforms, and
are playing safe by using widest-possible int type" (e.g. showing a
pid_t value from daemon.c).

In this codepath, the actual values are specified to be uint64_t, so
the use of PRIu64 may be OK, but I have to wonder why the codepath
is not dealing with uintmax_t in the first place.  When even larger
than present archs are prevalent in N years and 64-bit starts to
feel a tad small (like we feel for 16-bit ints these days), it will
feel a bit silly to have a subsystem that is limited to such a
"fixed and a tad small these days" types and pretend it to be be a
generic seriealizer, I suspect.


Re: [PATCH v2 01/36] t/helper: add an empty test-tool program

2018-03-26 Thread Johannes Schindelin
Hi Duy,

On Sat, 24 Mar 2018, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> new file mode 100644
> index 00..c730f718ca
> --- /dev/null
> +++ b/t/helper/test-tool.c
> @@ -0,0 +1,27 @@
> +#include "git-compat-util.h"
> +#include "test-tool.h"
> +
> +struct test_cmd {
> + const char *name;
> + int (*main)(int argc, const char **argv);

This makes the build fail on Windows, as we override `main` in
compat/mingw.h:

/*
 * A replacement of main() that adds win32 specific initialization.
 */

void mingw_startup(void);
#define main(c,v) dummy_decl_mingw_main(void); \
static int mingw_main(c,v); \
int main(int argc, const char **argv) \
{ \
mingw_startup(); \
return mingw_main(__argc, (void *)__argv); \
} \
static int mingw_main(c,v)


I know, I know, now that common-main.c is a thing, we could simply pluck a
`platform_specific_pre_main()` or some such, which is overridden in
compat/mingw.h to `mingw_startup` and to a no-op in git-compat-util.h as a
fall-back.

The truth is: I simply did not get around to doing this yet.

In the meantime, could we have this SQUASH???, please?

-- snipsnap --
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index cd5e28b045a..87066ced62a 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -3,7 +3,7 @@
 
 struct test_cmd {
const char *name;
-   int (*main)(int argc, const char **argv);
+   int (*fn)(int argc, const char **argv);
 };
 
 static struct test_cmd cmds[] = {
@@ -55,7 +55,7 @@ int cmd_main(int argc, const char **argv)
if (!strcmp(cmds[i].name, argv[1])) {
argv++;
argc--;
-   return cmds[i].main(argc, argv);
+   return cmds[i].fn(argc, argv);
}
}
die("There is no test named '%s'", argv[1]);


Re: [PATCH v7 00/13] nd/pack-objects-pack-struct updates

2018-03-26 Thread Jeff King
On Sat, Mar 24, 2018 at 07:33:40AM +0100, Nguyễn Thái Ngọc Duy wrote:

> +unsigned long oe_get_size_slow(struct packing_data *pack,
> +const struct object_entry *e)
> +{
> + struct packed_git *p;
> + struct pack_window *w_curs;
> + unsigned char *buf;
> + enum object_type type;
> + unsigned long used, avail, size;
> +
> + if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) {
> + read_lock();
> + if (sha1_object_info(e->idx.oid.hash, ) < 0)
> + die(_("unable to get size of %s"),
> + oid_to_hex(>idx.oid));
> + read_unlock();
> + return size;
> + }
> +
> + p = oe_in_pack(pack, e);
> + if (!p)
> + die("BUG: when e->type is a delta, it must belong to a pack");
> +
> + read_lock();
> + w_curs = NULL;
> + buf = use_pack(p, _curs, e->in_pack_offset, );
> + used = unpack_object_header_buffer(buf, avail, , );
> + if (used == 0)
> + die(_("unable to parse object header of %s"),
> + oid_to_hex(>idx.oid));
> +
> + unuse_pack(_curs);
> + read_unlock();
> + return size;
> +}

It took me a while to figure out why this treated deltas and non-deltas
differently. At first I thought it was an optimization (since we can
find non-delta sizes quickly by looking at the headers).  But I think
it's just that you want to know the size of the actual _delta_, not the
reconstructed object. And there's no way to ask sha1_object_info() for
that.

Perhaps the _extended version of that function should learn an
OBJECT_INFO_NO_DEREF flag or something to tell it return the true delta
type and size. Then this whole function could just become a single call.

But short of that, it's probably worth a comment explaining what's going
on.

> +static void prepare_in_pack_by_idx(struct packing_data *pdata)
> +{
> + struct packed_git **mapping, *p;
> + int cnt = 0, nr = 1 << OE_IN_PACK_BITS;
> +
> + if (getenv("GIT_TEST_FULL_IN_PACK_ARRAY")) {
> + /*
> +  * leave in_pack_by_idx NULL to force in_pack[] to be
> +  * used instead
> +  */
> + return;
> + }

Minor nit, but can we use git_env_bool() here? It's just as easy, and
it's less surprising in some corner cases.

>  struct object_entry *packlist_alloc(struct packing_data *pdata,
>   const unsigned char *sha1,
>   uint32_t index_pos)
>  {
>   struct object_entry *new_entry;
>  
> + if (!pdata->nr_objects) {
> + prepare_in_pack_by_idx(pdata);
> + if (getenv("GIT_TEST_OE_SIZE_BITS")) {
> + int bits = atoi(getenv("GIT_TEST_OE_SIZE_BITS"));;
> + pdata->oe_size_limit = 1 << bits;
> + }
> + if (!pdata->oe_size_limit)
> + pdata->oe_size_limit = 1 << OE_SIZE_BITS;
> + }

Ditto here; I think this could just be:

  pdata->oe_size_limit = git_env_ulong("GIT_TEST_OE_SIZE_BITS",
   (1 << OE_SIZE_BITS));

>   if (pdata->nr_objects >= pdata->nr_alloc) {
>   pdata->nr_alloc = (pdata->nr_alloc  + 1024) * 3 / 2;
>   REALLOC_ARRAY(pdata->objects, pdata->nr_alloc);
> +
> + if (!pdata->in_pack_by_idx)
> + REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
>   }

I was going to complain that we don't use ALLOC_GROW() here, but
actually that part is in the context. ;)

> @@ -35,7 +36,9 @@ enum dfs_state {
>   *
>   * "size" is the uncompressed object size. Compressed size of the raw
>   * data for an object in a pack is not stored anywhere but is computed
> - * and made available when reverse .idx is made.
> + * and made available when reverse .idx is made. Note that when an
> + * delta is reused, "size" is the uncompressed _delta_ size, not the
> + * canonical one after the delta has been applied.

s/an delta/a delta/

> +Running tests with special setups
> +-
> +
> +The whole test suite could be run to test some special features
> +that cannot be easily covered by a few specific test cases. These
> +could be enabled by running the test suite with correct GIT_TEST_
> +environment set.
> +
> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
> +
> +GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
> +path where there are more than 1024 packs even if the actual number of
> +packs in repository is below this limit.
> +
> +GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
> +code path where we do not cache objecct size in memory and read it
> +from existing packs on demand. This normally only happens when the
> +object size is over 2GB. This variable forces the code path on any
> +object larger than 2^ bytes.

It's nice to have these available to test the uncommon 

Re: [PATCH v7 0/3] RUNTIME_PREFIX relocatable Git

2018-03-26 Thread Daniel Jacques
On Mon, Mar 26, 2018 at 10:08 AM Ævar Arnfjörð Bjarmason 
wrote:

> > Oh sorry, I must have missed that. I have a personal preference for
adding
> > brackets for clarity; it leaked into this patch set. I did implement
most
> > of the suggestion, which was to use the escaped Q/E instead of equals.
> >
> > Stylistically I still prefer the braces, but I'll defer to you and
remove
> > them in my pending patch set in case I'm asked to submit another
version.

> If you prefer it that way just keep your version. It's your code and
> it's just a trivial style difference.

> I just mentioned it because in the previous discussion you said "I agree
> it's cleaner" so I inferred that you'd just forgotten about it but meant
> to change it. It's also fine if later you just thought "you know what,
> I'm doing it my way" :)

Honestly there's enough of a delay here that I don't remember, but if I had
to guess I probably just forgot to make the change :)


Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Edward Thomson
On Mon, Mar 26, 2018 at 2:08 PM, Derrick Stolee  wrote:
> Since most heavily-used tools that didn't spawn Git processes use
> LibGit2 to interact with Git repos, I added Ed Thomson to CC to see
> if libgit2 could ever write these bad header comments.

We added the `sorted` capability to our `packed-refs` header relatively
recently (approximately two months ago, v0.27.0 will be the first release
to include it as of today).  So, at the moment, libgit2 writes:

  "# pack-refs with: peeled fully-peeled sorted "

Prior to this change, libgit2's header was stable for the last five years
as:

  "# pack-refs with: peeled fully-peeled "

And prior to that, we advertised only `peeled`:

  "# pack-refs with: peeled "

Thanks for thinking of us!

-ed


Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Jeff King
On Mon, Mar 26, 2018 at 09:08:04AM -0400, Derrick Stolee wrote:

> Since most heavily-used tools that didn't spawn Git processes use LibGit2 to
> interact with Git repos, I added Ed Thomson to CC to see if libgit2 could
> ever write these bad header comments.

Ed can probably answer more definitively, but I dug in the libgit2
history a little, and I think it has only ever generated correct
"pack-refs with:" lines.

Ditto for JGit, though there it blames down to 1a6964c82 (Initial JGit
contribution to eclipse.org, 2009-09-29). I didn't dig further into the
historical JGit repository, but I think that's probably far enough to
feel good about it.

-Peff


[PATCH v4] json_writer: new routines to create data in JSON format

2018-03-26 Thread git
From: Jeff Hostetler 

Add a series of jw_ routines and "struct json_writer" structure to compose
JSON data.  The resulting string data can then be output by commands wanting
to support a JSON output format.

The json-writer routines can be used to generate structured data in a
JSON-like format.  We say "JSON-like" because we do not enforce the Unicode
(usually UTF-8) requirement on string fields.  Internally, Git does not
necessarily have Unicode/UTF-8 data for most fields, so it is currently
unclear the best way to enforce that requirement.  For example, on Linx
pathnames can contain arbitrary 8-bit character data, so a command like
"status" would not know how to encode the reported pathnames.  We may want
to revisit this (or double encode such strings) in the future.

The initial use for the json-writer routines is for generating telemetry
data for executed Git commands.  Later, we may want to use them in other
commands, such as status.

Helped-by: René Scharfe 
Helped-by: Wink Saville 
Helped-by: Ramsay Jones 
Signed-off-by: Jeff Hostetler 
---
 Makefile|   2 +
 json-writer.c   | 395 +
 json-writer.h   |  92 +++
 t/helper/test-json-writer.c | 590 
 t/t0019-json-writer.sh  | 253 +++
 5 files changed, 1332 insertions(+)
 create mode 100644 json-writer.c
 create mode 100644 json-writer.h
 create mode 100644 t/helper/test-json-writer.c
 create mode 100755 t/t0019-json-writer.sh

diff --git a/Makefile b/Makefile
index 1a9b23b..57f58e6 100644
--- a/Makefile
+++ b/Makefile
@@ -662,6 +662,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-hashmap
 TEST_PROGRAMS_NEED_X += test-index-version
+TEST_PROGRAMS_NEED_X += test-json-writer
 TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
 TEST_PROGRAMS_NEED_X += test-line-buffer
 TEST_PROGRAMS_NEED_X += test-match-trees
@@ -815,6 +816,7 @@ LIB_OBJS += hashmap.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
 LIB_OBJS += ident.o
+LIB_OBJS += json-writer.o
 LIB_OBJS += kwset.o
 LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
diff --git a/json-writer.c b/json-writer.c
new file mode 100644
index 000..ddcbd2a
--- /dev/null
+++ b/json-writer.c
@@ -0,0 +1,395 @@
+#include "cache.h"
+#include "json-writer.h"
+
+void jw_init(struct json_writer *jw)
+{
+   strbuf_reset(>json);
+   strbuf_reset(>open_stack);
+   jw->first = 0;
+   jw->pretty = 0;
+}
+
+/*
+ * Append JSON-quoted version of the given string to 'out'.
+ */
+static void append_quoted_string(struct strbuf *out, const char *in)
+{
+   unsigned char c;
+
+   strbuf_addch(out, '"');
+   while ((c = *in++) != '\0') {
+   if (c == '"')
+   strbuf_addstr(out, "\\\"");
+   else if (c == '\\')
+   strbuf_addstr(out, "");
+   else if (c == '\n')
+   strbuf_addstr(out, "\\n");
+   else if (c == '\r')
+   strbuf_addstr(out, "\\r");
+   else if (c == '\t')
+   strbuf_addstr(out, "\\t");
+   else if (c == '\f')
+   strbuf_addstr(out, "\\f");
+   else if (c == '\b')
+   strbuf_addstr(out, "\\b");
+   else if (c < 0x20)
+   strbuf_addf(out, "\\u%04x", c);
+   else
+   strbuf_addch(out, c);
+   }
+   strbuf_addch(out, '"');
+}
+
+static inline void indent_pretty(struct json_writer *jw)
+{
+   int k;
+
+   if (!jw->pretty)
+   return;
+
+   for (k = 0; k < jw->open_stack.len; k++)
+   strbuf_addstr(>json, "  ");
+}
+
+static inline void begin(struct json_writer *jw, char ch_open, int pretty)
+{
+   jw->pretty = pretty;
+   jw->first = 1;
+
+   strbuf_addch(>json, ch_open);
+
+   strbuf_addch(>open_stack, ch_open);
+}
+
+/*
+ * Assert that the top of the open-stack is an object.
+ */
+static inline void assert_in_object(const struct json_writer *jw, const char 
*key)
+{
+   if (!jw->open_stack.len)
+   die("json-writer: object: missing jw_object_begin(): '%s'", 
key);
+   if (jw->open_stack.buf[jw->open_stack.len - 1] != '{')
+   die("json-writer: object: not in object: '%s'", key);
+}
+
+/*
+ * Assert that the top of the open-stack is an array.
+ */
+static inline void assert_in_array(const struct json_writer *jw)
+{
+   if (!jw->open_stack.len)
+   die("json-writer: array: missing jw_array_begin()");
+   if (jw->open_stack.buf[jw->open_stack.len - 1] != '[')
+   die("json-writer: array: not in array");
+}
+
+/*
+ * Add comma if we have already seen a member at this level.
+ */
+static inline 

[PATCH v4] routines to generate JSON data

2018-03-26 Thread git
From: Jeff Hostetler 

This is version 4 of my JSON data format routines.

This version adds a "pretty" formatted output.  I consider this to be
mainly for debugging, but worth keeping available in release builds.

I simplified the stack-level tracing as suggested by René Scharfe and
hinted at by Peff.

I converted the _double() routines to take an integer precision rather
than a format specification and build a known-to-be-good format string
to minimize the __attribute__(...) issues raised by René Scharfe.

It fixes the PRIuMAX and "void inline" compiler warnings on OSX that
were reported by Wink Saville and Ramsay Jones.  And resolved the "sparse"
warnings repoted by Ramsay Jones.

And I updated the commit message and header file documnetation to address
the JSON-like (Unicode limitations) mentioned by Jonathan Nieder.

Jeff Hostetler (1):
  json_writer: new routines to create data in JSON format

 Makefile|   2 +
 json-writer.c   | 395 +
 json-writer.h   |  92 +++
 t/helper/test-json-writer.c | 590 
 t/t0019-json-writer.sh  | 253 +++
 5 files changed, 1332 insertions(+)
 create mode 100644 json-writer.c
 create mode 100644 json-writer.h
 create mode 100644 t/helper/test-json-writer.c
 create mode 100755 t/t0019-json-writer.sh

-- 
2.9.3



Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-26 Thread Johannes Schindelin
Hi Buga,

On Fri, 16 Mar 2018, Igor Djordjevic wrote:

> [...]
>
> Yes, having more steps would mean more power/options to the user, but
> more complexity to explain to and guide him through as well, not really
> sure where the line should be drawn - for the first time, at least.

If you want to avoid having a huge discussion with me about bias, male
privilege and how unaware most men are of it, and how it excludes half the
potential usership/talented developers, and how representation matters --
and believe me, you do want to avoid this discussion -- you will want to
avoid referring to the user as a "he".

It might be true in your case. But it is also true in your case that you
are Russian. Yet you write English here, probably to avoid excluding
people from the discussion. And you should demonstrate the same courtesy
to people who do not happen to identify with the same gender as you do.

In short: every time you think of a user or a developer as "he", take a
step back and reflect how excluded *you* would feel if someone forced you
to change that to "she". That is exactly how much you exclude non-males if
you think of them as "he". Just don't.

The remedy is easy: use the gender-neutral "they". Which is, by the way,
not a modern invention as of late (in contrast to the "he" you use):
https://en.wikipedia.org/wiki/Singular_they#Older_usage

Ciao,
Dscho


  1   2   >