Re: RFC: Design and code of partial clones (now, missing commits and trees OK)

2017-09-18 Thread Junio C Hamano
Jonathan Tan  writes:

> For those interested in partial clones and/or missing objects in repos,
> I've updated my original partialclone patches to not require an explicit
> list of promises. Fetch/clone still only permits exclusion of blobs, but
> the infrastructure is there for a local repo to support missing trees
> and commits as well.
> ...
> Demo
> 
>
> Obtain a repository.
>
> $ make prefix=$HOME/local install
> $ cd $HOME/tmp
> $ git clone https://github.com/git/git
>
> Make it advertise the new feature and allow requests for arbitrary blobs.
>
> $ git -C git config uploadpack.advertiseblobmaxbytes 1
> $ git -C git config uploadpack.allowanysha1inwant 1
>
> Perform the partial clone and check that it is indeed smaller. Specify
> "file://" in order to test the partial clone mechanism. (If not, Git will
> perform a local clone, which unselectively copies every object.)
>
> $ git clone --blob-max-bytes=10 "file://$(pwd)/git" git2
> $ git clone "file://$(pwd)/git" git3
> $ du -sh git2 git3
> 116M  git2
> 129M  git3
>
> Observe that the new repo is automatically configured to fetch missing objects
> from the original repo. Subsequent fetches will also be partial.
>
> $ cat git2/.git/config
> [core]
>   repositoryformatversion = 1
>   filemode = true
>   bare = false
>   logallrefupdates = true
> [remote "origin"]
>   url = [snip]
>   fetch = +refs/heads/*:refs/remotes/origin/*
>   blobmaxbytes = 10
> [extensions]
>   partialclone = origin
> [branch "master"]
>   remote = origin
>   merge = refs/heads/master

The above sequence of events make quite a lot of sense.  And the
following description of how it is designed (snipped) is clear
enough (at least to me) to allow me to say that I quite like it.




Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer

2017-09-18 Thread Jeff King
On Mon, Sep 18, 2017 at 04:17:24PM -0700, Jonathan Nieder wrote:

> phionah bugosi wrote:
> 
> > Just to reecho a previous change requested before in one of the mail
> > threads, we currently have two global variables declared:
> >
> > struct mru packed_git_mru_storage;
> > struct mru *packed_git_mru = _git_mru_storage;
> >
> > We normally use pointers in C to point or refer to the same location
> > or space in memory from multiple places. That means in situations
> > where we will have the pointer be assigned to in many places in our
> > code. But in this case, we do not have any other logic refering or
> > assigning to the pointer packed_git_mru. In simple words we actually
> > dont need a pointer in this case.
> >
> > Therefore this patch makes packed_git_mru global a value instead of
> > a pointer and makes use of list.h
> >
> > Signed-off-by: phionah bugosi 
> > ---
> >  builtin/pack-objects.c |  5 +++--
> >  cache.h|  7 ---
> >  list.h |  6 ++
> >  packfile.c | 12 ++--
> >  4 files changed, 15 insertions(+), 15 deletions(-)
> 
> *puzzled* This appears to already be in "pu", with a different author.
> Did you independently make the same change?  Or are you asking for a
> progress report on that change, and just phrasing it in a confusing
> way?

I pointed Phionah at your #leftoverbits comment in:

  
https://public-inbox.org/git/20170912172839.gb144...@aiede.mtv.corp.google.com/

about moving packed_git_mru to list.h. But I'm afraid I wasn't very
clear in giving further guidance.

The goal is to build on _top_ of the patch in that message, and convert
the doubly-linked list implementation used in "struct mru" to use the
shared code in list.h. That should mostly involve touching mru.c and
mru.h, though I think we'd need to tweak the name of the "next" pointer
during the traversal, too.

-Peff


Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors

2017-09-18 Thread Jeff King
On Mon, Sep 18, 2017 at 07:46:24PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > When we fail to open $GIT_DIR/info/alternates, we silently
> > assume there are no alternates. This is the right thing to
> > do for ENOENT, but not for other errors.
> >
> > A hard error is probably overkill here. If we fail to read
> > an alternates file then either we'll complete our operation
> > anyway, or we'll fail to find some needed object. Either
> > way, a warning is good idea. And we already have a helper
> > function to handle this pattern; let's just call
> > warn_on_fopen_error().
> 
> I think I prefer a hard error.  What kind of cases are you imagining
> where it would be better to warn?
> 
> E.g. for EIO, erroring out so that the user can try again seems better
> than hoping that the application will be able to cope with the more
> subtle error that comes from discovering some objects are missing.
> 
> For EACCES, I can see how it makes sense to warn and move on, but no
> other errors like that are occuring to me.
> 
> Thoughts?

Yes, EACCES is exactly the example that came to mind.  But most
importantly, we don't know at this point whether the alternate is even
relevant to the current operation. So calling die() here would make some
cases fail that would otherwise succeed. That's usually not a big deal,
but we've had cases where being lazy about dying helps people use git
itself to help repair the case.

Admittedly most of those chicken-and-egg problems are centered around
git-config (e.g., using "git config --edit" to repair broken config), so
it's perhaps less important here. But it seems like a reasonable
principle to follow in general.

If there's a compelling reason to die hard, I'd reconsider it. But I
can't really think of one (if we were _writing_ a new alternates file
and encountered a read error of the old data we're copying, then I think
it would be very important to bail before claiming a successful write.
But that's not what's happening here).

-Peff


Re: [PATCH 1/2] read_info_alternates: read contents into strbuf

2017-09-18 Thread Jeff King
On Mon, Sep 18, 2017 at 07:42:53PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > Reported-by: Michael Haggerty 
> > Signed-off-by: Jeff King 
> > ---
> >  sha1_file.c | 29 +
> >  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> Thanks for tracking it down.

To be fair, Michael did most of the work in identifying and bisecting
the bug. He even wrote a very similar patch in parallel; I just swooped
in at the end.

> > path = xstrfmt("%s/info/alternates", relative_base);
> > -   fd = git_open(path);
> > -   free(path);
> > -   if (fd < 0)
> > -   return;
> > -   if (fstat(fd, ) || (st.st_size == 0)) {
> > -   close(fd);
> > +   if (strbuf_read_file(, path, 1024) < 0) {
> > +   free(path);
> > return;
> 
> strbuf_read_file is careful to release buf on failure, so this doesn't
> leak (but it's a bit subtle).

Yep. I didn't think it was worth calling out with a comment since the
"don't allocate on failure" pattern is common to most of the strbuf
functions.

> What happened to the !st.st_size case?  Is it eliminated for
> simplicity?

Good question, and the answer is yes. Obviously we can bail early on an
empty file, but I don't think there's any reason to complicate the code
with it (the original seems to come from d5a63b9983 (Alternate object
pool mechanism updates., 2005-08-14), where it avoided a corner case
that has long since been deleted.

-Peff


Re: [PATCH 0/2] fix read past end of array in alternates files

2017-09-18 Thread Jeff King
On Mon, Sep 18, 2017 at 07:36:03PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > This series fixes a regression in v2.11.1 where we might read past the
> > end of an mmap'd buffer. It was introduced in cf3c635210,
> 
> The above information is super helpful.  Can it go in one of the commit
> messages?

Er, didn't I?

> > base the patch on there, for a few reasons:
> >
> >   1. There's a trivial conflict when merging up (because of
> >  git_open_noatime() becoming just git_open() in the inerim).
> >
> >   2. The reproduction advice relies on our SANITIZE Makefile knob, which
> >  didn't exist back then.
> >
> >   3. The second patch does not apply there because we don't have
> >  warn_on_fopen_errors(). Though admittedly it could be applied
> >  separately after merging up; it's just a clean-up on top.
> 
> Even this part could go in a commit message, but it's fine for it not
> to.

IMHO this kind of meta information doesn't belong in the commit message.
It's useful to the maintainer to know where to apply the patch, but I
don't think it helps somebody who is reading "git log" output.

-Peff


Re: [PATCH v2] Improve performance of git status --ignored

2017-09-18 Thread Junio C Hamano
Jameson Miller  writes:

> Improve the performance of the directory listing logic when it wants to list
> non-empty ignored directories. In order to show non-empty ignored directories,
> the existing logic will recursively iterate through all contents of an ignored
> directory. This change introduces the optimization to stop iterating through
> the contents once it finds the first file.

Wow, such an obviously correct optimization.  Very nicely explained, too.

> This can have a significant
> improvement in 'git status --ignored' performance in repositories with a large
> number of files in ignored directories.
>
> For an example of the performance difference on an example repository with
> 196,000 files in 400 ignored directories:
>
> | Command|  Time (s) |
> | -- | - |
> | git status |   1.2 |
> | git status --ignored (old) |   3.9 |
> | git status --ignored (new) |   1.4 |
>
> Signed-off-by: Jameson Miller 
> ---

I wish all the contributions I have to accept are as nicely done as
this one ;-)

Thanks.

>  dir.c | 47 +--
>  1 file changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 1c55dc3..1d17b80 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -49,7 +49,7 @@ struct cached_dir {
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>   struct index_state *istate, const char *path, int len,
>   struct untracked_cache_dir *untracked,
> - int check_only, const struct pathspec *pathspec);
> + int check_only, int stop_at_first_file, const struct pathspec 
> *pathspec);

We might want to make check_only and stop_at_first_file into a
single "unsigned flags" used as a collection of bits, but we can
wait until we start feeling the urge to add the third boolean
parameter to this function (at which point I'd probably demand a
preliminary clean-up to merge these two into a single flags word
before adding the third one as a new bit in that word).

> @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct 
> dir_struct *dir,
>  
>   untracked = lookup_untracked(dir->untracked, untracked,
>dirname + baselen, len - baselen);
> +
> + /*
> +  * If this is an excluded directory, then we only need to check if
> +  * the directory contains any files.
> +  */
>   return read_directory_recursive(dir, istate, dirname, len,
> - untracked, 1, pathspec);
> + untracked, 1, exclude, pathspec);

Nicely explained in the in-code comment.

I'd assume that you want your microsoft e-mail address used on the
signed-off-by line appear as the author, so I'll tweak this a bit to
make it so (otherwise, your 8...@gmail.com would become the author).

Thanks.


Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository

2017-09-18 Thread Junio C Hamano
Jonathan Nieder  writes:

>>  test_expect_success 'showing the superproject correctly' '
>
> With the two tweaks mentioned above,
> Reviewed-by: Jonathan Nieder 

I agree with the fixes to the test titles suggested, so I'll queue
the patch with the fixes squashed in.  Hearing "yeah, the titles
were copy-pasted without adjusting, thanks for fixing, Jonathan!"
sent by Øystein would be super nice.

Thanks, both.




Re: [PATCH] t/README: fix typo and grammatically improve a sentence

2017-09-18 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> Signed-off-by: Kaartic Sivaraam 
> ---
>  t/README | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/t/README b/t/README
> index 2f95860369751..4b079e4494d93 100644
> --- a/t/README
> +++ b/t/README
> @@ -265,12 +265,12 @@ or:
>  
>  $ sh ./t9200-git-cvsexport-commit.sh --run='-3 21'
>  
> -As noted above, the test set is built going though items left to
> -right, so this:
> +As noted above, the test set is built by going through the items
> +from left to right, so this:
>  
>  $ sh ./t9200-git-cvsexport-commit.sh --run='1-4 !3'
>  
> -will run tests 1, 2, and 4.  Items that comes later have higher
> +will run tests 1, 2, and 4.  Items that come later have higher
>  precedence.  It means that this:
>  
>  $ sh ./t9200-git-cvsexport-commit.sh --run='!3 1-4'
>
> --
> https://github.com/git/git/pull/404

Both of these look to me obvious improvements.  I'll queue them
unless other people object.

Thanks.


Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-18 Thread Junio C Hamano
Junio C Hamano  writes:

> But the point is that we do not want such an overhead in core, as
> all of that is a useless waste of the cycle for a site that wants to
> store the push certificate away outside of the repository itself.

By this I do not mean that cert blobs shouldn't become part of the
in-repository record in _all_ installations that receive signed push
certificates.  An easy to reuse example shipped together with our
source would be a good thing, and an easy to enable sample hook may
even be better.  It's just that I do not want to have any "solution"
in the core part that everybody that wants to accept push certs must
run, if the "solution" is not something we can endorse as the best
current practice.  And I do not yet see how anything along the lines
of the patch that started this thread, or its extention by wrapping
them with commit chain, would become that "best current practice".

A sample hook, on the other hand, is simpler for people to experiment
and we might even come up with an unversally useful best current
prctice out of such an experiment, though.


Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted

2017-09-18 Thread Junio C Hamano
Ian Campbell  writes:

> This can be useful e.g. in `filter-branch` when rewritting tags produced by
> older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel
> source tree:
>
> $ git cat-file tag v2.6.12-rc2
> object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
> type commit
> tag v2.6.12-rc2
>
> Linux v2.6.12-rc2 release
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.4 (GNU/Linux)
>
> iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc
> wznDbFU45A54dZC8RZ5JxyE=
> =ESRP
> -END PGP SIGNATURE-
>
> $ git cat-file tag v2.6.12-rc2 | git mktag
> error: char76: could not find "tagger "
> fatal: invalid tag signature file
> $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger
> 9e734775f7c22d2f89943ad6c745571f1930105f
>
> To that end, pass the new option to `mktag` in `filter-branch`.

Hmph.  I cannot shake this nagging feeling that this is probably a
solution that is overly narrow to a single problem that won't scale
into the future.

As there is no guarantee that "missing-tagger" will stay to be the
only kind of broken tag objects we'd produce in one version, later
we notice our mistake and then forbid in another version, with the
approach to add '--allow-missing-tagger' optoin would imply that
we'd end up adding more and more such options, and filter-branch
will need to use all these '--allow-this-breakage' options we would
ever add.  Even though I fully agree with the problem you are trying
to solve (i.e. we want to be able to replay an old history without
our tool rejecting the data we have), it was my first reaction when
I read this patch.  IOW, my first reaction was "perhaps a single
option '--allow-broken' to cover the currently known and any future
shape of malformat over tag data is more appropriate".

But then, if we look at the body of cmd_mktag(), it looks like this:

int cmd_mktag(...)
{
read input into strbuf buf;
call verify_tag on buf to sanity check;
call write_sha1_file() the contents of buf as a tag;
report the object name;
}

If we drop the "verification" step from the above, that essentially
becomes an equivaent to "hash-object -t tag -w --stdin".

So I now have to wonder if it may be sufficient to use "hash-object"
in filter-branch, without doing this "allow malformed data that we
would not permit if the tag were being created today, only to help
replaying an old, already broken data" change to "git mktag".

Is there something that makes "hash-object" insufficient (like it
still does some extra checks we would want to disable and cannot
work as a replacement for your "--allow-missing-tagger")?

Thanks.

> Signed-off-by: Ian Campbell 
> ---
>  Documentation/git-mktag.txt |   9 +++-
>  builtin/mktag.c | 100 
> +---
>  git-filter-branch.sh|   2 +-
>  t/t3800-mktag.sh|  33 ++-
>  4 files changed, 98 insertions(+), 46 deletions(-)
>
> diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
> index fa6a75612..c720c7419 100644
> --- a/Documentation/git-mktag.txt
> +++ b/Documentation/git-mktag.txt
> @@ -9,7 +9,7 @@ git-mktag - Creates a tag object
>  SYNOPSIS
>  
>  [verse]
> -'git mktag'
> +'git mktag' [--allow-missing-tagger]
>  
>  DESCRIPTION
>  ---
> @@ -34,6 +34,13 @@ exists, is separated by a blank line from the header.  The
>  message part may contain a signature that Git itself doesn't
>  care about, but that can be verified with gpg.
>  
> +OPTIONS
> +---
> +--allow-missing-tagger::
> + Allow the `tagger` line in the header to be omitted. This is
> + rarely desirable but may be useful in recreating tags created
> + by older Git.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/builtin/mktag.c b/builtin/mktag.c
> index 031b750f0..0f5dae8d5 100644
> --- a/builtin/mktag.c
> +++ b/builtin/mktag.c
> @@ -1,4 +1,5 @@
>  #include "builtin.h"
> +#include "parse-options.h"
>  #include "tag.h"
>  
>  /*
> @@ -15,6 +16,8 @@
>   * the shortest possible tagger-line.
>   */
>  
> +static int allow_missing_tagger;
> +
>  /*
>   * We refuse to tag something we can't verify. Just because.
>   */
> @@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size)
>   unsigned char sha1[20];
>   const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb;
>   size_t len;
> + const unsigned long min_size = allow_missing_tagger ? 71 : 84;
>  
> - if (size < 84)
> + if (size < min_size)
>   return error("wanna fool me ? you obviously got the size wrong 
> !");
>  
>   buffer[size] = 0;
> @@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size)
>   /* Verify the tagger line 

Re: [PATCH 2/2] t9010-*.sh: skip all tests if the PIPE prereq is missing

2017-09-18 Thread Jonathan Nieder
Ramsay Jones wrote:

> Every test in this file, except one, is marked with the PIPE prereq.
> However, that lone test ('set up svn repo'), only performs some setup
> work and checks whether the following test should be executed (by
> setting an additional SVNREPO prerequisite). Since the following test
> also requires the PIPE prerequisite, performing the setup test, when the
> PIPE preequisite is missing, is simply wasted effort. Use the skip-all
> test facility to skip all tests when the PIPE prerequisite is missing.
>
> Signed-off-by: Ramsay Jones 
> ---
>  t/t9010-svn-fe.sh | 55 
> ---
>  1 file changed, 28 insertions(+), 27 deletions(-)

It was always the intention that this test script eventually be able
to run on Windows, but it was not meant to be.  It would need to use a
socket or something for backflow to work on Windows.

Reviewed-by: Jonathan Nieder 


Re: [PATCH 1/2] test-lib: use more compact expression in PIPE prerequisite

2017-09-18 Thread Jonathan Nieder
Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 
> ---
>  t/test-lib.sh | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors

2017-09-18 Thread Jonathan Nieder
Jeff King wrote:

> When we fail to open $GIT_DIR/info/alternates, we silently
> assume there are no alternates. This is the right thing to
> do for ENOENT, but not for other errors.
>
> A hard error is probably overkill here. If we fail to read
> an alternates file then either we'll complete our operation
> anyway, or we'll fail to find some needed object. Either
> way, a warning is good idea. And we already have a helper
> function to handle this pattern; let's just call
> warn_on_fopen_error().

I think I prefer a hard error.  What kind of cases are you imagining
where it would be better to warn?

E.g. for EIO, erroring out so that the user can try again seems better
than hoping that the application will be able to cope with the more
subtle error that comes from discovering some objects are missing.

For EACCES, I can see how it makes sense to warn and move on, but no
other errors like that are occuring to me.

Thoughts?

Thanks,
Jonathan

> Note that technically the errno from strbuf_read_file()
> might be from a read() error, not open(). But since read()
> would never return ENOENT or ENOTDIR, and since it produces
> a generic "unable to access" error, it's suitable for
> handling errors from either.


Re: [PATCH 1/2] read_info_alternates: read contents into strbuf

2017-09-18 Thread Jonathan Nieder
Jeff King wrote:

> Reported-by: Michael Haggerty 
> Signed-off-by: Jeff King 
> ---
>  sha1_file.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)

Thanks for tracking it down.

Reviewed-by: Jonathan Nieder 

[...]
> +++ b/sha1_file.c
[...]
> @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int 
> len, int sep,
>  
>  static void read_info_alternates(const char * relative_base, int depth)
>  {
> - char *map;
> - size_t mapsz;
> - struct stat st;
>   char *path;
> - int fd;
> + struct strbuf buf = STRBUF_INIT;
>  
>   path = xstrfmt("%s/info/alternates", relative_base);
> - fd = git_open(path);
> - free(path);
> - if (fd < 0)
> - return;
> - if (fstat(fd, ) || (st.st_size == 0)) {
> - close(fd);
> + if (strbuf_read_file(, path, 1024) < 0) {
> + free(path);
>   return;

strbuf_read_file is careful to release buf on failure, so this doesn't
leak (but it's a bit subtle).

What happened to the !st.st_size case?  Is it eliminated for
simplicity?

The rest looks good.

Thanks,
Jonathan


Re: [PATCH 0/2] fix read past end of array in alternates files

2017-09-18 Thread Jonathan Nieder
Jeff King wrote:

> This series fixes a regression in v2.11.1 where we might read past the
> end of an mmap'd buffer. It was introduced in cf3c635210,

The above information is super helpful.  Can it go in one of the commit
messages?

>   but I didn't
> base the patch on there, for a few reasons:
>
>   1. There's a trivial conflict when merging up (because of
>  git_open_noatime() becoming just git_open() in the inerim).
>
>   2. The reproduction advice relies on our SANITIZE Makefile knob, which
>  didn't exist back then.
>
>   3. The second patch does not apply there because we don't have
>  warn_on_fopen_errors(). Though admittedly it could be applied
>  separately after merging up; it's just a clean-up on top.

Even this part could go in a commit message, but it's fine for it not
to.

Thanks,
Jonathan


Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository

2017-09-18 Thread Jonathan Nieder
Hi,

Øystein Walle wrote:

> Running `git fetch --unshallow` on a repo that is not in fact shallow
> produces a fatal error message.

Hm, can you say more about the context?  From a certain point of view,
it might make sense for that command to succeed instead: if the repo
is already unshallow, then why should't "fetch --unshallow" complain
instead of declaring victory?

> Add a helper to rev-parse that scripters
> can use to determine whether a repo is shallow or not.
>
> Signed-off-by: Øystein Walle 
> ---
>  Documentation/git-rev-parse.txt |  3 +++
>  builtin/rev-parse.c |  5 +
>  t/t1500-rev-parse.sh| 15 +++
>  3 files changed, 23 insertions(+)

Regardless, this new rev-parse --is-shallow helper looks like a good
feature.

[...]
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const 
> char *prefix)
>   : "false");
>   continue;
>   }
> + if (!strcmp(arg, "--is-shallow-repository")) {
> + printf("%s\n", is_repository_shallow() ? "true"
> + : "false");
> + continue;
> + }

The implementation is straightforward and correct.

[...]
> --- a/t/t1500-rev-parse.sh
> +++ b/t/t1500-rev-parse.sh

Thanks for writing tests. \o/

> @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'git-path shallow repository' '

What does git-path mean here?  I wonder if it's a copy/paste error.
Did you mean something like

 test_expect_success 'rev-parse --is-shallow-repository in shallow repo' '

?

> + test_commit test_commit &&
> + echo true >expect &&
> + git clone --depth 1 --no-local . shallow &&
> + test_when_finished "rm -rf shallow" &&
> + git -C shallow rev-parse --is-shallow-repository >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'git-path notshallow repository' '

Likewise: should this be

 test_expect_success 'rev-parse --is-shallow-repository in non-shallow repo' '

?

> + echo false >expect &&
> + git rev-parse --is-shallow-repository >actual &&
> + test_cmp expect actual
> +'
> +
>  test_expect_success 'showing the superproject correctly' '

With the two tweaks mentioned above,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-18 Thread Junio C Hamano
Johannes Schindelin  writes:

>> Do you have a concrete suggestion to make these individual entries more
>> helpful for people who may want go back to the original thread in doing
>> so?  In-reply-to: or References: fields of the "What's cooking" report
>> would not help.  I often have the message IDs that made/helped me make
>> these individual comments in the description; they alone would not react
>> to mouse clicks, though.
>
> Oh gawd, not even more stuff piled onto the mail format. Please stop.
> ...
> It probably tries to serve too many purposes at the same time, and thereby
> none.

Well, this was started as my attempt to give a public service that
shows a summary of what is happening in the entire integration tree,
as there was nothing like that before (and going to github.com and
looking at 'pu' branch would not give you an easy overview).  As
many people contribute many topics to the project, complaining that
it talks about too many topics would not get you anywhere.

If you find "What's cooking" report not serving your needs, and if
no one finds it not serving his or her needs, then I can stop
sending these out, of course, but I am not getting the impression
that we are at that point, at least not yet.


Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

2017-09-18 Thread Junio C Hamano
Lars Schneider  writes:

> SZEDER noticing a bug in this series that I was about to fix:
> https://public-inbox.org/git/3b175d35-5b1c-43cd-a7e9-85693335b...@gmail.com/
>
> I assume at this point a new patch is better than a re-roll, right?

Thanks, yes indeed.


Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-18 Thread Junio C Hamano
Santiago Torres  writes:

> - *if there is a hook* the blob is computed, but it is up to the
>   hook itself to store it *somewhere*. This makes me feel like it's
>   somewhat of a useless waste of computation if the hook is not
>   meant to handle it anyway(which is just a post-receive hook). I
>   find it rather weird that --signed is a builtin flag, and is
>   handled on the server side only partially (just my two cents).

The way it was envisioned to be used is that the repository meant to
be protected by collected push certs may not be trusted as the
permanent store for push certs by all hosting sites.  The hook may
be told the name of a blob to read its contents and is expected to
store it away to somewhere else.

The only reason why we use blob is because creating a blob in
respose to pushes being executed in parallel will result in
different blobs unless there is hash collision.  Instead of us
having to come up with and use a different mechanism to create a
unique temporary filename and feed that to hook, reusing blob as
such was the simplest.

> I understand the concurrency concerns, so I agree with stefan's
> solution, although I don't know how big of a deal it would be, if git
> already supports --atomic pushes (admittedly, I haven't checked if there
> are any guards in place for someone who pushes millions of refs
> atomically). It'd be completely understandable to have a setting to
> disable hadnling of --signed pushes and, ideally, a way to warn the user
> of this feature being disabled if --signed is sent.

I do not think atomic helps at all, when one atomic push updates
branch A while another atomic push updates branch B.  They can still
go in parallel, and their certificates must both be stored.  You can
somehow serialize them and create a single strand of pearls to
advance a single ref, or you can let both to fork two histories to
store the push certs from these two pushes and have somebody create
a merge commit to join the history.  

But the point is that we do not want such an overhead in core, as
all of that is a useless waste of the cycle for a site that wants to
store the push certificate away outside of the repository itself.


Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()

2017-09-18 Thread Junio C Hamano
Derrick Stolee  writes:

>> But I do not think we want this "clever" optimization that involves
>> 'n' in the first place.
 +  while (count++ < 10) {
>>> +   for (i = 0; i < n; i++)
>>> +   ((unsigned int*)oid.hash)[i] = hash_base;
>>
>> Does it make sense to assume that uint is always 4-byte (so this
>> code won't work if it is 8-byte on your platform) and doing this is
>> faster than using platform-optimized memcpy()?
>
> I'm not sure what you mean by using memcpy to improve this, because
> it would require calling memcpy in the inner loop, such as
>
>   for (i = 0; i < n; i++)
>   memcpy(oid.hash + i * sizeof(unsigned), _base,
>  sizeof(unsigned));

Sorry, I left it without saying as I thought it was obvious, but
what I meant was to use a whole "struct oid", not just a single
unsigned (repeated), as the hash that is tested.  If you have an
array of object names you use in the test, then

for (count = 0; count < limit; count++) {
hashcpy(, [count]);

... do the probing ...
}

> First, this doesn't just measure the time it takes to determine non-
> existence,

Sorry, my phrasing was indeed misleading.  I know the time we spend
to see if we have or do not have the object is the largest cycle
spender in these codepaths (and even if it were, I do not think that
is what you are trying to optimize in these patches anyway).  

But if I recall correctly, the way we come up with the unique
abbreviation for an object that exists and an object that does not
exist are different?  And because most of the time (think: "git log
-p" output) we would be finding abbreviation for objects that we do
have, benchmarking and tweaking the code that comes up with an
object that does not exist is not optimizing for the right case.

Back when I wrote that initial response, I didn't check how
different the code was between the two cases, but now I did.  It
seems that in both cases we start from the shortest-allowed length
and then extend the same way, and the only difference between these
two cases is that we return immediately when our candidate name is
long enough not to match any existing object when the full name
refers to an object we do not have, while we return only when
disambiguity is resolved.  I _think_ these amount to the same
computation (i.e. when an object with the full name we have exists,
the amount of computation we need to come up with its unique
abbreviation is the same as the computation we need to find the
unique abbreviation for the same name in another repository that has
identical set of objects, minus that single object), so from that
point of view, throwing random hashes, most of which would not name
any existing object, and measuring how much time it takes to run
get_short_oid() to compute the minimum length of the unique prefix
should be sufficient.

Thanks.



Re: [PATCH] describe: teach --match to handle branches and remotes

2017-09-18 Thread Jacob Keller
On Mon, Sep 18, 2017 at 4:52 PM, Junio C Hamano  wrote:
> Max Kirillov  writes:
>
>>  --match ::
>>   Only consider tags matching the given `glob(7)` pattern,
>> - excluding the "refs/tags/" prefix.  This can be used to avoid
>> - leaking private tags from the repository. If given multiple times, a
>> - list of patterns will be accumulated, and tags matching any of the
>> - patterns will be considered. Use `--no-match` to clear and reset the
>> - list of patterns.
>> + excluding the "refs/tags/" prefix. If used with `--all`, it also
>> + considers local branches and remote-tracking references matching the
>> + pattern, excluding respectively "refs/heads/" and "refs/remotes/"
>> + prefix; references of other types are never considered. If given
>> + multiple times, a list of patterns will be accumulated, and tags
>> + matching any of the patterns will be considered.  Use `--no-match` to
>> + clear and reset the list of patterns.
>>
>>  --exclude ::
>>   Do not consider tags matching the given `glob(7)` pattern, excluding
>> - the "refs/tags/" prefix. This can be used to narrow the tag space and
>> - find only tags matching some meaningful criteria. If given multiple
>> - times, a list of patterns will be accumulated and tags matching any
>> - of the patterns will be excluded. When combined with --match a tag will
>> - be considered when it matches at least one --match pattern and does not
>> + the "refs/tags/" prefix. If used with `--all`, it also does not 
>> consider
>> + local branches and remote-tracking references matching the pattern,
>> + excluding respectively "refs/heads/" and "refs/remotes/" prefix;
>> + references of other types are never considered. If given multiple 
>> times,
>> + a list of patterns will be accumulated and tags matching any of the
>> + patterns will be excluded. When combined with --match a tag will be
>> + considered when it matches at least one --match pattern and does not
>>   match any of the --exclude patterns. Use `--no-exclude` to clear and
>>   reset the list of patterns.
>
> OK, I find this written clearly enough.
>
>> diff --git a/builtin/describe.c b/builtin/describe.c
>> index 94ff2fba0b..2a2e998063 100644
>> --- a/builtin/describe.c
>> +++ b/builtin/describe.c
>> @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path,
>>   }
>>  }
>>
>> +/* Drops prefix. Returns NULL if the path is not expected with current 
>> settings. */
>> +static const char *get_path_to_match(int is_tag, int all, const char *path)
>> +{
>> + if (is_tag)
>> + return path + 10;
>
> This is a faithful conversion of the existing code that wants to
> behave the same as original, but a bit more on this later.
>
>> + else if (all) {
>> + if (starts_with(path, "refs/heads/"))
>> + return path + 11; /* "refs/heads/..." */
>> + else if (starts_with(path, "refs/remotes/"))
>> + return path + 13; /* "refs/remotes/..." */
>> + else
>> + return 0;
>
> I think you can use skip_prefix() to avoid counting the length of
> the prefix yourself, i.e.
>
> else if all {
> const char *body;
>
> if (skip_prefix(path, "refs/heads/", ))
> return body;
> else if (skip_prefix(path, "refs/remotes/", ))
> ...
> }
>
> Whether you do the above or not, the last one that returns 0 should
> return NULL (to the language it is the same thing, but to humans, we
> write NULL when it is the null pointer, not the number 0).
>
>> + } else
>> + return NULL;
>> +}
>
> Perhaps the whole thing may want to be a bit more simplified, like:
>
> static const *skip_ref_prefix(const char *path, int all)
> {
> const char *prefix[] = {
> "refs/tags/", "refs/heads/", "refs/remotes/"
> };
> const char *body;
> int cnt;
> int bound = all ? ARRAY_SIZE(prefix) : 1;
>

I found the implicit use of "bound = 1" means "we only care about
tags" to be a bit weird here. I guess it's not really that big a deal
overall, and this is definitely cleaner than the original
implementation.

> for (cnt = 0; cnt < bound; cnt++)
> if (skip_prefix(path, prefix[cnt], );
> return body;
> return NULL;
> }
>
> The hardcoded +10 for "is_tag" case assumes that anything other than
> "refs/tags/something" would ever be used to call into this function
> when is_tag is true, and that may well be true in the current code
> and have been so ever since the original code was written, but it
> still smells like an invitation for future bugs.
>
> I dunno.
>
>> +
>>  static 

Re: [PATCH 1/2] read_info_alternates: read contents into strbuf

2017-09-18 Thread Junio C Hamano
Jeff King  writes:

> We could also just make a NUL-terminated copy of the input
> bytes and operate on that. But since all but one caller
> already is passing a string, instead let's just fix that
> caller to provide NUL-terminated input in the first place,
> by swapping out mmap for strbuf_read_file().
> ...
> Let's also drop the "len" parameter entirely from
> link_alt_odb_entries(), since it's completely ignored. That
> will avoid any new callers re-introducing a similar bug.

Both design decisions make perfect sense to me.

>  sha1_file.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)

And diffstat also agrees that it is a good change ;-)


Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-18 Thread Stefan Beller
> I am hoping that this last one is not allowed and we can use the
> "same condition is checked every time we loop" version that hides
> the uglyness inside the macro.

By which you are referring to Jonathans solution posted.
Maybe we can combine the two solutions (checking for thelist
to not be NULL once, by Jonathan) and using an outer structure
(SZEDERs solution) by replacing the condition by a for loop,
roughly (untested):

#define for_each_string_list_item(item,list) \
-   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
+for (; list; list = NULL)
+for (item = (list)->items; item < (list)->items + (list)->nr; ++item)

as that would not mingle with any dangling else clause.
It is also just one statement, such that

if (bla)
  for_each_string_list_item {
baz(item);
  }
else
  foo;

still works.

Are there downsides to this combined approach?


Re: [PATCH] describe: teach --match to handle branches and remotes

2017-09-18 Thread Junio C Hamano
Max Kirillov  writes:

>  --match ::
>   Only consider tags matching the given `glob(7)` pattern,
> - excluding the "refs/tags/" prefix.  This can be used to avoid
> - leaking private tags from the repository. If given multiple times, a
> - list of patterns will be accumulated, and tags matching any of the
> - patterns will be considered. Use `--no-match` to clear and reset the
> - list of patterns.
> + excluding the "refs/tags/" prefix. If used with `--all`, it also
> + considers local branches and remote-tracking references matching the
> + pattern, excluding respectively "refs/heads/" and "refs/remotes/"
> + prefix; references of other types are never considered. If given
> + multiple times, a list of patterns will be accumulated, and tags
> + matching any of the patterns will be considered.  Use `--no-match` to
> + clear and reset the list of patterns.
>  
>  --exclude ::
>   Do not consider tags matching the given `glob(7)` pattern, excluding
> - the "refs/tags/" prefix. This can be used to narrow the tag space and
> - find only tags matching some meaningful criteria. If given multiple
> - times, a list of patterns will be accumulated and tags matching any
> - of the patterns will be excluded. When combined with --match a tag will
> - be considered when it matches at least one --match pattern and does not
> + the "refs/tags/" prefix. If used with `--all`, it also does not consider
> + local branches and remote-tracking references matching the pattern,
> + excluding respectively "refs/heads/" and "refs/remotes/" prefix;
> + references of other types are never considered. If given multiple times,
> + a list of patterns will be accumulated and tags matching any of the
> + patterns will be excluded. When combined with --match a tag will be
> + considered when it matches at least one --match pattern and does not
>   match any of the --exclude patterns. Use `--no-exclude` to clear and
>   reset the list of patterns.

OK, I find this written clearly enough.

> diff --git a/builtin/describe.c b/builtin/describe.c
> index 94ff2fba0b..2a2e998063 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path,
>   }
>  }
>  
> +/* Drops prefix. Returns NULL if the path is not expected with current 
> settings. */
> +static const char *get_path_to_match(int is_tag, int all, const char *path)
> +{
> + if (is_tag)
> + return path + 10;

This is a faithful conversion of the existing code that wants to
behave the same as original, but a bit more on this later.

> + else if (all) {
> + if (starts_with(path, "refs/heads/"))
> + return path + 11; /* "refs/heads/..." */
> + else if (starts_with(path, "refs/remotes/"))
> + return path + 13; /* "refs/remotes/..." */
> + else
> + return 0;

I think you can use skip_prefix() to avoid counting the length of
the prefix yourself, i.e.

else if all {
const char *body;

if (skip_prefix(path, "refs/heads/", ))
return body;
else if (skip_prefix(path, "refs/remotes/", ))
...
}

Whether you do the above or not, the last one that returns 0 should
return NULL (to the language it is the same thing, but to humans, we
write NULL when it is the null pointer, not the number 0).

> + } else
> + return NULL;
> +}

Perhaps the whole thing may want to be a bit more simplified, like:

static const *skip_ref_prefix(const char *path, int all)
{
const char *prefix[] = {
"refs/tags/", "refs/heads/", "refs/remotes/"
};
const char *body;
int cnt;
int bound = all ? ARRAY_SIZE(prefix) : 1;

for (cnt = 0; cnt < bound; cnt++)
if (skip_prefix(path, prefix[cnt], );
return body;
return NULL;
}

The hardcoded +10 for "is_tag" case assumes that anything other than
"refs/tags/something" would ever be used to call into this function
when is_tag is true, and that may well be true in the current code
and have been so ever since the original code was written, but it
still smells like an invitation for future bugs.

I dunno.

> +
>  static int get_name(const char *path, const struct object_id *oid, int flag, 
> void *cb_data)
>  {
>   int is_tag = starts_with(path, "refs/tags/");
> @@ -140,12 +156,13 @@ static int get_name(const char *path, const struct 
> object_id *oid, int flag, voi
>*/
>   if (exclude_patterns.nr) {
>   struct string_list_item *item;
> + const char *path_to_match = get_path_to_match(is_tag, all, 
> path);


Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer

2017-09-18 Thread Jonathan Nieder
Hi,

phionah bugosi wrote:

> Just to reecho a previous change requested before in one of the mail
> threads, we currently have two global variables declared:
>
> struct mru packed_git_mru_storage;
> struct mru *packed_git_mru = _git_mru_storage;
>
> We normally use pointers in C to point or refer to the same location
> or space in memory from multiple places. That means in situations
> where we will have the pointer be assigned to in many places in our
> code. But in this case, we do not have any other logic refering or
> assigning to the pointer packed_git_mru. In simple words we actually
> dont need a pointer in this case.
>
> Therefore this patch makes packed_git_mru global a value instead of
> a pointer and makes use of list.h
>
> Signed-off-by: phionah bugosi 
> ---
>  builtin/pack-objects.c |  5 +++--
>  cache.h|  7 ---
>  list.h |  6 ++
>  packfile.c | 12 ++--
>  4 files changed, 15 insertions(+), 15 deletions(-)

*puzzled* This appears to already be in "pu", with a different author.
Did you independently make the same change?  Or are you asking for a
progress report on that change, and just phrasing it in a confusing
way?

Confused,
Jonathan


Re: RFC v3: Another proposed hash function transition plan

2017-09-18 Thread Jonathan Nieder
Hi,

Gilles Van Assche wrote:
> Hi Johannes,

>> SHA-256 got much more cryptanalysis than SHA3-256 […].
>
> I do not think this is true. Keccak/SHA-3 actually got (and is still
> getting) a lot of cryptanalysis, with papers published at renowned
> crypto conferences [1].
>
> Keccak/SHA-3 is recognized to have a significant safety margin. E.g.,
> one can cut the number of rounds in half (as in Keyak or KangarooTwelve)
> and still get a very strong function. I don't think we could say the
> same for SHA-256 or SHA-512…

I just wanted to thank you for paying attention to this conversation
and weighing in.

Most of the regulars in the git project are not crypto experts.  This
kind of extra information (and e.g. [2]) is very useful to us.

Thanks,
Jonathan

> Kind regards,
> Gilles, for the Keccak team
>
> [1] https://keccak.team/third_party.html
[2] 
https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/


Re: RFC v3: Another proposed hash function transition plan

2017-09-18 Thread Johannes Schindelin
Hi Gilles,

On Mon, 18 Sep 2017, Gilles Van Assche wrote:

> > SHA-256 got much more cryptanalysis than SHA3-256 […].
> 
> I do not think this is true.

Please read what I said again: SHA-256 got much more cryptanalysis than
SHA3-256.

I never said that SHA3-256 got little cryptanalysis. Personally, I think
that SHA3-256 got a ton more cryptanalysis than SHA-1, and that SHA-256
*still* got more cryptanalysis. But my opinion does not count, really.
However, the two experts I pestered with questions over questions left me
with that strong impression, and their opinion does count.

> Keccak/SHA-3 actually got (and is still getting) a lot of cryptanalysis,
> with papers published at renowned crypto conferences [1].
> 
> Keccak/SHA-3 is recognized to have a significant safety margin. E.g.,
> one can cut the number of rounds in half (as in Keyak or KangarooTwelve)
> and still get a very strong function. I don't think we could say the
> same for SHA-256 or SHA-512…

Again, I do not want to criticize SHA3/Keccak. Personally, I have a lot of
respect for Keccak.

I also have a lot of respect for everybody who scrutinized the SHA2 family
of algorithms.

I also respect the fact that there are more implementations of SHA-256,
and thanks to everybody seeming to demand SHA-256 checksums instead of
SHA-1 or MD5 for downloads, bugs in those implementations are probably
discovered relatively quickly, and I also cannot ignore the prospect of
hardware support for SHA-256.

In any case, having SHA3 as a fallback in case SHA-256 gets broken seems
like a very good safety net to me.

Ciao,
Johannes

Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

2017-09-18 Thread Stefan Beller
>> instead compared to the proposed configuration above.
>> (Even better yet, then people could play around with "v1 only"
>> and see how it falls apart on old servers)
>
> Except we can't start with an explicit whitelist because we must
> fallback to v0 if v1 isn't supported otherwise we would break people.
>
> That is unless we have the semantics of: If not configured v0 will be
> used, otherwise only use the configured protocol versions.
>

Good point.

Thinking about this, how about:

  If not configured, we do as we want. (i.e. Git has full control over
  it's decision making process, which for now is "favor v0 over v1 as
  we are experimenting with v1". This strategy may change in the future
  to "prefer highest version number that both client and server can speak".)

  If it is configured, "use highest configured number from the given set".

?


Re: [PATCH v2] add test for bug in git-mv for recursive submodules

2017-09-18 Thread Stefan Beller
>> Took a little while but here is a more clean patch creating individual
>> submodules for the nesting.
>>
>> Cheers Heiko

Thanks for writing this test!

>
> Thanks.  Stefan, does this look good to you now?

Yes, though there are nits below.

> It is not quite clear which step is expected to fail with the
> current code by reading the test or the proposed log message.  Does
> "mv" refuse to work and we do not get to run "status", or does
> "status" report a failure, or do we fail well before that?

git-mv failing seems like a new possibility without incurring
another process spawn with the new repository object.
(Though then we could also just fix the recursed submodule)

> The log message that only says "This does not work when ..." is not
> helpful in figuring it out, either.  Something like "This does not
> work and fails to update the paths for its configurations" or
> whatever that describes "what actually happens" (in contrast to
> "what ought to happen", which you described clearly) should be
> there.
>
> Description on how you happened to have discovered the issue feels a
> lot less relevant compared to that, and it is totally useless if it
> is unclear what the issue is in the first place.
>
>>  t/t7001-mv.sh | 25 +
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
>> index e365d1ff77..cbc5fb37fe 100755
>> --- a/t/t7001-mv.sh
>> +++ b/t/t7001-mv.sh
>> @@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested 
>> directories' '
>>   test_cmp actual expect
>>  '
>>
>> +test_expect_failure 'moving nested submodules' '
>> + git commit -am "cleanup commit" &&
>> + mkdir sub_nested_nested &&
>> + (cd sub_nested_nested &&

We seem to have different styles for nested shell. I prefer

  outside command &&
  (
  first nested command here &&
  ...

as that aligns indentation to the nesting level. I have seen
the style you use a lot in the  test suite, and we do not have
a guideline in Documentation/CodingGuidelines, so I do not
complain too loudly. ;)


>> + touch nested_level2 &&
>> + git init &&
>> + git add . &&
>> + git commit -m "nested level 2"
>> + ) &&
>> + mkdir sub_nested &&
>> + (cd sub_nested &&
>> + touch nested_level1 &&
>> + git init &&
>> + git add . &&
>> + git commit -m "nested level 1"
>> + git submodule add ../sub_nested_nested &&
>> + git commit -m "add nested level 2"
>> + ) &&
>> + git submodule add ./sub_nested nested_move &&
>> + git commit -m "add nested_move" &&
>> + git submodule update --init --recursive &&

So far a nice setup!

>> + git mv nested_move sub_nested_moved &&

This is the offending command that produces the bug,
as it will break most subsequent commands, such as

>> + git status

git-status is one of the basic commands. Without
status to function, I think it is hard to recover your repo without
a lot of in-depth knowledge of Git (submodules).

I wonder if git-status should complain more gracefully
and fallback to one of --ignore-submodules={dirty, all},
that actually still works.

Maybe we could introduce a new default mode for this
flag, that is "none-except-on-error", though this sounds
as if we're fixing symptoms instead of the root cause.


Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

2017-09-18 Thread Brandon Williams
On 09/18, Stefan Beller wrote:
> >> From a users POV this may be frustrating as I would imagine that
> >> people want to run
> >>
> >>   git config --global protocol.version 2
> >>
> >> to try it out and then realize that some of their hosts do not speak
> >> 2, so they have to actually configure it per repo/remote.
> >
> > The point would be to be able to set this globally, not per-repo.  Even
> > if a repo doesn't speak v2 then it should be able to gracefully degrade
> > to v1 without the user having to do anything.  The reason why there is
> > this escape hatch is if doing the protocol negotiation out of band
> > causing issues with communicating with a server that it can be shut off.
> 
> In the current situation it is easy to assume that if v1 (and not v0)
> is configured, that the users intent is "to try out v1 and fallback
> gracefully to v0".
> 
> But this will change over time in the future!
> 
> Eventually people will have the desire to say:
> "Use version N+1, but never version N", because N has
> performance or security issues; the user might not want
> to bother to try N or even actively want to be affirmed that
> Git will never use version N.
> 
> In this future we need a mechanism, that either contains a
> white list or black list of protocols. To keep it simple (I assume
> there won't be many protocol versions), a single white list will do.
> 
> However transitioning from the currently proposed "try the new
> configured thing and fallback to whatever" to "this is the exact list
> of options that Git will try for you" will be hard, as we may break people
> if we do not unconditionally fall back to v0.
> 
> That is why I propose to start with an explicit white list as then we
> do not have to have a transition plan or otherwise work around the
> issue. Also it doesn't hurt now to use
> 
> git config --global protocol.version v1,v0
> 
> instead compared to the proposed configuration above.
> (Even better yet, then people could play around with "v1 only"
> and see how it falls apart on old servers)

Except we can't start with an explicit whitelist because we must
fallback to v0 if v1 isn't supported otherwise we would break people.

That is unless we have the semantics of: If not configured v0 will be
used, otherwise only use the configured protocol versions.

-- 
Brandon Williams


Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

2017-09-18 Thread Stefan Beller
>> From a users POV this may be frustrating as I would imagine that
>> people want to run
>>
>>   git config --global protocol.version 2
>>
>> to try it out and then realize that some of their hosts do not speak
>> 2, so they have to actually configure it per repo/remote.
>
> The point would be to be able to set this globally, not per-repo.  Even
> if a repo doesn't speak v2 then it should be able to gracefully degrade
> to v1 without the user having to do anything.  The reason why there is
> this escape hatch is if doing the protocol negotiation out of band
> causing issues with communicating with a server that it can be shut off.

In the current situation it is easy to assume that if v1 (and not v0)
is configured, that the users intent is "to try out v1 and fallback
gracefully to v0".

But this will change over time in the future!

Eventually people will have the desire to say:
"Use version N+1, but never version N", because N has
performance or security issues; the user might not want
to bother to try N or even actively want to be affirmed that
Git will never use version N.

In this future we need a mechanism, that either contains a
white list or black list of protocols. To keep it simple (I assume
there won't be many protocol versions), a single white list will do.

However transitioning from the currently proposed "try the new
configured thing and fallback to whatever" to "this is the exact list
of options that Git will try for you" will be hard, as we may break people
if we do not unconditionally fall back to v0.

That is why I propose to start with an explicit white list as then we
do not have to have a transition plan or otherwise work around the
issue. Also it doesn't hurt now to use

git config --global protocol.version v1,v0

instead compared to the proposed configuration above.
(Even better yet, then people could play around with "v1 only"
and see how it falls apart on old servers)


RE: [PATCH v6 12/12] fsmonitor: add a performance test

2017-09-18 Thread Ben Peart

> -Original Message-
> From: Johannes Schindelin [mailto:johannes.schinde...@gmx.de]
> Sent: Monday, September 18, 2017 10:25 AM
> To: Ben Peart 
> Cc: david.tur...@twosigma.com; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> pclo...@gmail.com; p...@peff.net
> Subject: Re: [PATCH v6 12/12] fsmonitor: add a performance test
> 
> Hi Ben,
> 
> sorry for not catching this earlier:
> 
> On Fri, 15 Sep 2017, Ben Peart wrote:
> 
> > [...]
> > +
> > +int cmd_dropcaches(void)
> > +{
> > +   HANDLE hProcess = GetCurrentProcess();
> > +   HANDLE hToken;
> > +   int status;
> > +
> > +   if (!OpenProcessToken(hProcess, TOKEN_QUERY |
> TOKEN_ADJUST_PRIVILEGES, ))
> > +   return error("Can't open current process token");
> > +
> > +   if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1))
> > +   return error("Can't get SeProfileSingleProcessPrivilege");
> > +
> > +   CloseHandle(hToken);
> > +
> > +   HMODULE ntdll = LoadLibrary("ntdll.dll");
> 

Thanks Johannes, I'll fix that.

> Git's source code still tries to abide by C90, and for simplicity's sake, this
> extends to the Windows-specific part. Therefore, the `ntdll` variable needs to
> be declared at the beginning of the function (I do agree that it makes for
> better code to reduce the scope of variables, but C90 simply did not allow
> variables to be declared in the middle of functions).
> 
> I wanted to send a patch address this in the obvious way, but then I
> encountered these lines:
> 
> > +   DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) =
> > +   (DWORD(WINAPI *)(INT, PVOID,
> ULONG))GetProcAddress(ntdll, "NtSetSystemInformation");
> > +   if (!NtSetSystemInformation)
> > +   return error("Can't get function addresses, wrong Windows
> > +version?");
> 
> It turns out that we have seen this plenty of times in Git for Windows'
> fork, so much so that we came up with a nice helper to make this all a bit
> more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and
> INIT_PROC_ADDR helpers in compat/win32/lazyload.h.
> 
> Maybe this would be the perfect excuse to integrate this patch into upstream
> Git? 

This patch is pretty hefty already.  How about you push this capability
upstream and I take advantage of it in a later patch. :)

This would be the patch (you can also cherry-pick it from
> 25c4dc3a73352e72e995594cf1b4afa46e93d040 in
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.
> com%2Fdscho%2Fgit=02%7C01%7CBen.Peart%40microsoft.com%7C96
> 4027bdc1f34a033c1f08d4fea1056e%7C72f988bf86f141af91ab2d7cd011db47
> %7C1%7C0%7C636413414914282865=jyvu6G7myRY9UA1XxWx2tDZ%
> 2BWsIWqLTRMT8WfzEGe5g%3D=0):
> 
> -- snip --
> From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00
> 2001
> From: Johannes Schindelin 
> Date: Tue, 10 Jan 2017 23:14:20 +0100
> Subject: [PATCH] Win32: simplify loading of DLL functions
> 
> Dynamic loading of DLL functions is duplicated in several places in Git for
> Windows' source code.
> 
> This patch adds a pair of macros to simplify the process: the
> DECLARE_PROC_ADDR(, , ,
> ..) macro to be used at the beginning of a
> code block, and the INIT_PROC_ADDR() macro to call before
> using the declared function. The return value of the INIT_PROC_ADDR() call
> has to be checked; If it is NULL, the function was not found in the specified
> DLL.
> 
> Example:
> 
> DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
>   LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
> 
> if (!INIT_PROC_ADDR(CreateHardLinkW))
> return error("Could not find CreateHardLinkW() function";
> 
>   if (!CreateHardLinkW(source, target, NULL))
>   return error("could not create hardlink from %S to %S",
>source, target);
>   return 0;
> 
> Signed-off-by: Karsten Blees 
> Signed-off-by: Johannes Schindelin 
> ---
>  compat/win32/lazyload.h | 44
> 
>  1 file changed, 44 insertions(+)
>  create mode 100644 compat/win32/lazyload.h
> 
> diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file
> mode 100644 index 000..91c10dad2fb
> --- /dev/null
> +++ b/compat/win32/lazyload.h
> @@ -0,0 +1,44 @@
> +#ifndef LAZYLOAD_H
> +#define LAZYLOAD_H
> +
> +/* simplify loading of DLL functions */
> +
> +struct proc_addr {
> + const char *const dll;
> + const char *const function;
> + FARPROC pfunction;
> + unsigned initialized : 1;
> +};
> +
> +/* Declares a function to be loaded dynamically from a DLL. */ #define
> +DECLARE_PROC_ADDR(dll, rettype, function, ...) \
> + static struct proc_addr proc_addr_##function = \
> + { #dll, #function, NULL, 0 }; \
> + static rettype (WINAPI *function)(__VA_ARGS__)
> +
> +/*
> + * Loads a function from a DLL 

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-18 Thread Stefan Beller
On Mon, Sep 18, 2017 at 7:22 AM, Santiago Torres  wrote:
> Hello, everyone.
>
> Sorry for being late in this thread, I was making sure I didn't say
> anything outrageously wrong.
>
>> That's Stefan; I wouldn't have suggested any approach that uses the
>> blob whose sole purpose is to serve as a temporary storage area to
>> pass the information to the hook as part of the permanent record.
>>

I put out a vague design that seemed to have more advantages
in my mind at the time of writing. :)

>
> Hmm, as far as I understand *this* is the status quo. We get an
> ephemeral sha1/oid only if you have a hook attached. Otherwise, you will
> never see the object at all, even if you have --signed set.
>
> I suggested preparing this RFC/Patch because of the following reasons
> (please let me know if my understanding of any of this is wrong...):
>
> - I find it weird that the cli allows for a --signed push and
>   nowhere in the receive-pack's feedback you're told if the push
>   certificate is compute/stored/handled at all. I think that, at the
>   latest, receive pack should let the user know whether the signed
>   push succeeded, or if there is no hook attached to handle it.

How would a user benefit from it?
(Are there cases where the organisation wants the user to not know
deliberately what happened to their push cert? Do we care about these
cases?)

> - *if there is a hook* the blob is computed, but it is up to the
>   hook itself to store it *somewhere*. This makes me feel like it's
>   somewhat of a useless waste of computation if the hook is not
>   meant to handle it anyway(which is just a post-receive hook). I
>   find it rather weird that --signed is a builtin flag, and is
>   handled on the server side only partially (just my two cents).

I agree, but many features in Git start out small and only partially.

> - To me, the way push certificates are handled now feels like having
>   git commit -S producing a detached signature that you have to
>   embed somehow in the resulting commit object. (As a matter of
>   fact, many points on [1] seem to back this notion, and even recall
>   some drawbacks on push certificates the way they are handled
>   today)
>
> I understand the concurrency concerns, so I agree with stefan's
> solution, although I don't know how big of a deal it would be, if git
> already supports --atomic pushes (admittedly, I haven't checked if there
> are any guards in place for someone who pushes millions of refs
> atomically). It'd be completely understandable to have a setting to
> disable hadnling of --signed pushes and, ideally, a way to warn the user
> of this feature being disabled if --signed is sent.

That makes sense.


Re: [PATCH] pack: make packed_git_mru global a value instead of a pointer

2017-09-18 Thread Stefan Beller
On Sun, Sep 17, 2017 at 1:44 PM, Christian Couder
 wrote:
> On Sun, Sep 17, 2017 at 10:42 PM, Christian Couder
>  wrote:
>> On Sun, Sep 17, 2017 at 10:17 PM, phionah bugosi  wrote:
>>> Signed-off-by: phionah bugosi 
>>> ---
>>>  builtin/pack-objects.c |  5 +++--
>>>  cache.h|  7 ---
>>>  list.h |  6 ++
>>>  packfile.c | 12 ++--
>>>  4 files changed, 15 insertions(+), 15 deletions(-)
>>>
>>> This patch makes packed_git_mru global a value instead of a pointer and
>>> makes use of list.h
>>
>> Please explain _why_ you are doing that (why is the resulting code better).
>
> Also the explanations should be in the commit message, that is above
> your "Signed-off-by: ..." line.

A similar patch exists at origin/jn/per-repo-object-store-fixes^^
(I haven't checked if there are differences and if so which
patch is better, all I am noting here, is that there has been
work very similar to this one a couple days prior)


[PATCH v2] Improve performance of git status --ignored

2017-09-18 Thread Jameson Miller
Improve the performance of the directory listing logic when it wants to list
non-empty ignored directories. In order to show non-empty ignored directories,
the existing logic will recursively iterate through all contents of an ignored
directory. This change introduces the optimization to stop iterating through
the contents once it finds the first file. This can have a significant
improvement in 'git status --ignored' performance in repositories with a large
number of files in ignored directories.

For an example of the performance difference on an example repository with
196,000 files in 400 ignored directories:

| Command|  Time (s) |
| -- | - |
| git status |   1.2 |
| git status --ignored (old) |   3.9 |
| git status --ignored (new) |   1.4 |

Signed-off-by: Jameson Miller 
---
 dir.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 1c55dc3..1d17b80 100644
--- a/dir.c
+++ b/dir.c
@@ -49,7 +49,7 @@ struct cached_dir {
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
struct index_state *istate, const char *path, int len,
struct untracked_cache_dir *untracked,
-   int check_only, const struct pathspec *pathspec);
+   int check_only, int stop_at_first_file, const struct pathspec 
*pathspec);
 static int get_dtype(struct dirent *de, struct index_state *istate,
 const char *path, int len);
 
@@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct 
dir_struct *dir,
 
untracked = lookup_untracked(dir->untracked, untracked,
 dirname + baselen, len - baselen);
+
+   /*
+* If this is an excluded directory, then we only need to check if
+* the directory contains any files.
+*/
return read_directory_recursive(dir, istate, dirname, len,
-   untracked, 1, pathspec);
+   untracked, 1, exclude, pathspec);
 }
 
 /*
@@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct 
dir_struct *dir,
 * with check_only set.
 */
return read_directory_recursive(dir, istate, path->buf, 
path->len,
-   cdir->ucd, 1, pathspec);
+   cdir->ucd, 1, 0, pathspec);
/*
 * We get path_recurse in the first run when
 * directory_exists_in_index() returns index_nonexistent. We
@@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir)
  * Also, we ignore the name ".git" (even if it is not a directory).
  * That likely will not change.
  *
+ * If 'stop_at_first_file' is specified, 'path_excluded' is returned
+ * to signal that a file was found. This is the least significant value that
+ * indicates that a file was encountered that does not depend on the order of
+ * whether an untracked or exluded path was encountered first.
+ *
  * Returns the most significant path_treatment value encountered in the scan.
+ * If 'stop_at_first_file' is specified, `path_excluded` is the most
+ * significant path_treatment value that will be returned.
  */
+
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
struct index_state *istate, const char *base, int baselen,
struct untracked_cache_dir *untracked, int check_only,
-   const struct pathspec *pathspec)
+   int stop_at_first_file, const struct pathspec *pathspec)
 {
struct cached_dir cdir;
enum path_treatment state, subdir_state, dir_state = path_none;
@@ -1832,12 +1845,34 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
subdir_state =
read_directory_recursive(dir, istate, path.buf,
 path.len, ud,
-check_only, pathspec);
+check_only, 
stop_at_first_file, pathspec);
if (subdir_state > dir_state)
dir_state = subdir_state;
}
 
if (check_only) {
+   if (stop_at_first_file) {
+   /*
+* If stopping at first file, then
+* signal that a file was found by
+* returning `path_excluded`. This is
+* to return a consistent value
+* regardless of whether an ignored or
+* excluded file happened to be
+* encountered 1st.
+

[PATCH v2] Improve performance of git status --ignored

2017-09-18 Thread Jameson Miller
This is the second version of my patches to improve handling of
ignored files

I have decided to break the original patch series into two parts:

 1) Perf improvements to handling ignored directories

 2) Expose extra options to control which ignored files are displayed
 by git status.

This patch will address #1, and I will follow up with another patch
for #2.

This patch improves the performance of 'git status --ignored' by
cutting out unnecessary work when determining if an ignored directory
is empty or not. The current logic will recursively enumerate through
all contents of an ignored directory to determine whether it is empty
or not. The new logic will return after the 1st file is
encountered. This can result in a significant speedup in work dirs
with a lot of files in ignored directories.

As an example of the performance improvement, here is a representative
example of a repository with ~190,000 files in ~400 ignored
directories:

| Command| Time (s) |
| -- |  |
| git status |1.2   |
| git status --ignored (old) |3.9   |
| git status --ignored (new) |1.4   |

Jameson Miller (1):
  Improve performance of git status --ignored

 dir.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

-- 
2.7.4



[OUTREACHY] pack: make packed_git_mru global a value instead of a pointer

2017-09-18 Thread phionah bugosi
Just to reecho a previous change requested before in one of the mail threads, 
we currently have
two global variables declared:

struct mru packed_git_mru_storage;
struct mru *packed_git_mru = _git_mru_storage;

We normally use pointers in C to point or refer to the same location or space 
in memory from multiple places. That
means in situations where we will have the pointer be assigned to in many 
places in our code. But in this case, we do not
have any other logic refering or assigning to the pointer packed_git_mru. In 
simple words we actually dont 
need a pointer in this case.

Therefore this patch makes packed_git_mru global a value instead of a pointer 
and
makes use of list.h


Signed-off-by: phionah bugosi 
---
 builtin/pack-objects.c |  5 +++--
 cache.h|  7 ---
 list.h |  6 ++
 packfile.c | 12 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a57b4f0..189123f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "list.h"
 #include "config.h"
 #include "attr.h"
 #include "object.h"
@@ -1012,7 +1013,7 @@ static int want_object_in_pack(const unsigned char *sha1,
return want;
}
 
-   for (entry = packed_git_mru->head; entry; entry = entry->next) {
+   for (entry = packed_git_mru.head; entry; entry = entry->next) {
struct packed_git *p = entry->item;
off_t offset;
 
@@ -1030,7 +1031,7 @@ static int want_object_in_pack(const unsigned char *sha1,
}
want = want_found_object(exclude, p);
if (!exclude && want > 0)
-   mru_mark(packed_git_mru, entry);
+   mru_mark(_git_mru, entry);
if (want != -1)
return want;
}
diff --git a/cache.h b/cache.h
index a916bc7..c8d7086 100644
--- a/cache.h
+++ b/cache.h
@@ -1585,13 +1585,6 @@ extern struct packed_git {
char pack_name[FLEX_ARRAY]; /* more */
 } *packed_git;
 
-/*
- * A most-recently-used ordered version of the packed_git list, which can
- * be iterated instead of packed_git (and marked via mru_mark).
- */
-struct mru;
-extern struct mru *packed_git_mru;
-
 struct pack_entry {
off_t offset;
unsigned char sha1[20];
diff --git a/list.h b/list.h
index a226a87..525703b 100644
--- a/list.h
+++ b/list.h
@@ -26,6 +26,12 @@
 #define LIST_H 1
 
 /*
+ * A most-recently-used ordered version of the packed_git list, which can
+ * be iterated instead of packed_git (and marked via mru_mark).
+ */
+extern struct mru packed_git_mru
+
+/*
  * The definitions of this file are adopted from those which can be
  * found in the Linux kernel headers to enable people familiar with the
  * latter find their way in these sources as well.
diff --git a/packfile.c b/packfile.c
index f86fa05..61a61aa 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "list.h"
 #include "mru.h"
 #include "pack.h"
 #include "dir.h"
@@ -41,8 +42,7 @@ static size_t peak_pack_mapped;
 static size_t pack_mapped;
 struct packed_git *packed_git;
 
-static struct mru packed_git_mru_storage;
-struct mru *packed_git_mru = _git_mru_storage;
+struct mru packed_git_mru;
 
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
@@ -861,9 +861,9 @@ static void prepare_packed_git_mru(void)
 {
struct packed_git *p;
 
-   mru_clear(packed_git_mru);
+   mru_clear(_git_mru);
for (p = packed_git; p; p = p->next)
-   mru_append(packed_git_mru, p);
+   mru_append(_git_mru, p);
 }
 
 static int prepare_packed_git_run_once = 0;
@@ -1832,9 +1832,9 @@ int find_pack_entry(const unsigned char *sha1, struct 
pack_entry *e)
if (!packed_git)
return 0;
 
-   for (p = packed_git_mru->head; p; p = p->next) {
+   for (p = packed_git_mru.head; p; p = p->next) {
if (fill_pack_entry(sha1, e, p->item)) {
-   mru_mark(packed_git_mru, p);
+   mru_mark(_git_mru, p);
return 1;
}
}
-- 
2.7.4



[PATCH] rev-parse: rev-parse: add --is-shallow-repository

2017-09-18 Thread Øystein Walle
Running `git fetch --unshallow` on a repo that is not in fact shallow
produces a fatal error message. Add a helper to rev-parse that scripters
can use to determine whether a repo is shallow or not.

Signed-off-by: Øystein Walle 
---
 Documentation/git-rev-parse.txt |  3 +++
 builtin/rev-parse.c |  5 +
 t/t1500-rev-parse.sh| 15 +++
 3 files changed, 23 insertions(+)

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index b1293f24b..0917b8207 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -235,6 +235,9 @@ print a message to stderr and exit with nonzero status.
 --is-bare-repository::
When the repository is bare print "true", otherwise "false".
 
+--is-shallow-repository::
+   When the repository is shallow print "true", otherwise "false".
+
 --resolve-git-dir ::
Check if  is a valid repository or a gitfile that
points at a valid repository, and print the location of the
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 2bd28d3c0..c923207f2 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
: "false");
continue;
}
+   if (!strcmp(arg, "--is-shallow-repository")) {
+   printf("%s\n", is_repository_shallow() ? "true"
+   : "false");
+   continue;
+   }
if (!strcmp(arg, "--shared-index-path")) {
if (read_cache() < 0)
die(_("Could not read the index"));
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index 03d3c7f6d..9d3433a30 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' '
test_cmp expect actual
 '
 
+test_expect_success 'git-path shallow repository' '
+   test_commit test_commit &&
+   echo true >expect &&
+   git clone --depth 1 --no-local . shallow &&
+   test_when_finished "rm -rf shallow" &&
+   git -C shallow rev-parse --is-shallow-repository >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'git-path notshallow repository' '
+   echo false >expect &&
+   git rev-parse --is-shallow-repository >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'showing the superproject correctly' '
git rev-parse --show-superproject-working-tree >out &&
test_must_be_empty out &&
-- 
2.11.0.485.g4e59582



Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

2017-09-18 Thread Brandon Williams
On 09/13, Stefan Beller wrote:
> On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williams  wrote:
> > Create protocol.{c,h} and provide functions which future servers and
> > clients can use to determine which protocol to use or is being used.
> >
> > Also introduce the 'GIT_PROTOCOL' environment variable which will be
> > used to communicate a colon separated list of keys with optional values
> > to a server.  Unknown keys and values must be tolerated.  This mechanism
> > is used to communicate which version of the wire protocol a client would
> > like to use with a server.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  Documentation/config.txt | 16 +++
> >  Documentation/git.txt|  5 
> >  Makefile |  1 +
> >  cache.h  |  7 +
> >  protocol.c   | 72 
> > 
> >  protocol.h   | 15 ++
> >  6 files changed, 116 insertions(+)
> >  create mode 100644 protocol.c
> >  create mode 100644 protocol.h
> >
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index dc4e3f58a..d5b28a32c 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -2517,6 +2517,22 @@ The protocol names currently used by git are:
> >  `hg` to allow the `git-remote-hg` helper)
> >  --
> >
> > +protocol.version::
> 
> It would be cool to set a set of versions that are good. (I am not sure
> if that can be deferred to a later patch.)
> 
>   Consider we'd have versions 0,1,2,3,4 in the future:
>   In an ideal world the client and server would talk using v4
>   as it is the most advanced protocol, right?
>   Maybe a security/performance issue is found on the server side
>   with say protocol v3. Still no big deal as we are speaking v4.
>   But then consider an issue is found on the client side with v4.
>   Then the client would happily talk 0..3 while the server would
>   love to talk using 0,1,2,4.
> 
> The way I think about protocol version negotiation is that
> both parties involved have a set of versions that they tolerate
> to talk (they might understand more than the tolerated set, but the
> user forbade some), and the goal of the negotiation is to find
> the highest version number that is part of both the server set
> and client set. So quite naturally with this line of thinking the
> configuration is to configure a set of versions, which is what
> I propose here. Maybe even in the wire format, separated
> with colons?

I'm sure it wouldn't take too much to change this to be a multi-valued
config.  Because after this series there is just v0 and v1 I didn't
think through this case too much.  If others agree then I can go ahead
and make it so in a reroll.

> 
> > +   If set, clients will attempt to communicate with a server using
> > +   the specified protocol version.  If unset, no attempt will be
> > +   made by the client to communicate using a particular protocol
> > +   version, this results in protocol version 0 being used.
> 
> This sounds as if we're going to be really shy at first and only
> users that care will try out new versions at their own risk.
> From a users POV this may be frustrating as I would imagine that
> people want to run
> 
>   git config --global protocol.version 2
> 
> to try it out and then realize that some of their hosts do not speak
> 2, so they have to actually configure it per repo/remote.

The point would be to be able to set this globally, not per-repo.  Even
if a repo doesn't speak v2 then it should be able to gracefully degrade
to v1 without the user having to do anything.  The reason why there is
this escape hatch is if doing the protocol negotiation out of band
causing issues with communicating with a server that it can be shut off.


> > +   Supported versions:
> 
> > +* `0` - the original wire protocol.
> 
> In the future this may be misleading as it doesn't specify the date of
> when it was original. e.g. are capabilities already supported in "original"?
> 
> Maybe phrase it as "wire protocol as of v2.14" ? (Though this sounds
> as if new capabilities added in the future are not allowed)

Yeah I can see how this could be misleading, though I'm not sure how
best to word it to avoid that.

> 
> 
> > +
> > +extern enum protocol_version parse_protocol_version(const char *value);
> > +extern enum protocol_version get_protocol_version_config(void);
> > +extern enum protocol_version determine_protocol_version_server(void);
> > +extern enum protocol_version determine_protocol_version_client(const char 
> > *server_response);
> 
> Here is a good place to have some comments.

-- 
Brandon Williams


Re: [PATCH 3/8] daemon: recognize hidden request arguments

2017-09-18 Thread Brandon Williams
On 09/13, Stefan Beller wrote:
> On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williams  wrote:
> > A normal request to git-daemon is structured as
> > "command path/to/repo\0host=..\0" and due to a bug in an old version of
> > git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the
> > command, 2009-06-04) we aren't able to place any extra args (separated
> > by NULs) besides the host.
> >
> > In order to get around this limitation teach git-daemon to recognize
> > additional request arguments hidden behind a second NUL byte.  Requests
> > can then be structured like:
> > "command path/to/repo\0host=..\0\0version=1\0key=value\0".  git-daemon
> > can then parse out the extra arguments and set 'GIT_PROTOCOL'
> > accordingly.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  daemon.c | 71 
> > +++-
> >  1 file changed, 61 insertions(+), 10 deletions(-)
> >
> > diff --git a/daemon.c b/daemon.c
> > index 30747075f..250dbf82c 100644
> > --- a/daemon.c
> > +++ b/daemon.c
> > @@ -282,7 +282,7 @@ static const char *path_ok(const char *directory, 
> > struct hostinfo *hi)
> > return NULL;/* Fallthrough. Deny by default */
> >  }
> >
> > -typedef int (*daemon_service_fn)(void);
> > +typedef int (*daemon_service_fn)(const struct argv_array *env);
> >  struct daemon_service {
> > const char *name;
> > const char *config_name;
> > @@ -363,7 +363,7 @@ static int run_access_hook(struct daemon_service 
> > *service, const char *dir,
> >  }
> >
> >  static int run_service(const char *dir, struct daemon_service *service,
> > -  struct hostinfo *hi)
> > +  struct hostinfo *hi, const struct argv_array *env)
> >  {
> > const char *path;
> > int enabled = service->enabled;
> > @@ -422,7 +422,7 @@ static int run_service(const char *dir, struct 
> > daemon_service *service,
> >  */
> > signal(SIGTERM, SIG_IGN);
> >
> > -   return service->fn();
> > +   return service->fn(env);
> >  }
> >
> >  static void copy_to_log(int fd)
> > @@ -462,25 +462,34 @@ static int run_service_command(struct child_process 
> > *cld)
> > return finish_command(cld);
> >  }
> >
> > -static int upload_pack(void)
> > +static int upload_pack(const struct argv_array *env)
> >  {
> > struct child_process cld = CHILD_PROCESS_INIT;
> > argv_array_pushl(, "upload-pack", "--strict", NULL);
> > argv_array_pushf(, "--timeout=%u", timeout);
> > +
> > +   argv_array_pushv(_array, env->argv);
> > +
> > return run_service_command();
> >  }
> >
> > -static int upload_archive(void)
> > +static int upload_archive(const struct argv_array *env)
> >  {
> > struct child_process cld = CHILD_PROCESS_INIT;
> > argv_array_push(, "upload-archive");
> > +
> > +   argv_array_pushv(_array, env->argv);
> > +
> > return run_service_command();
> >  }
> >
> > -static int receive_pack(void)
> > +static int receive_pack(const struct argv_array *env)
> >  {
> > struct child_process cld = CHILD_PROCESS_INIT;
> > argv_array_push(, "receive-pack");
> > +
> > +   argv_array_pushv(_array, env->argv);
> > +
> > return run_service_command();
> >  }
> >
> > @@ -574,7 +583,7 @@ static void canonicalize_client(struct strbuf *out, 
> > const char *in)
> >  /*
> >   * Read the host as supplied by the client connection.
> >   */
> > -static void parse_host_arg(struct hostinfo *hi, char *extra_args, int 
> > buflen)
> > +static char *parse_host_arg(struct hostinfo *hi, char *extra_args, int 
> > buflen)
> >  {
> > char *val;
> > int vallen;
> > @@ -602,6 +611,39 @@ static void parse_host_arg(struct hostinfo *hi, char 
> > *extra_args, int buflen)
> > if (extra_args < end && *extra_args)
> > die("Invalid request");
> > }
> > +
> > +   return extra_args;
> > +}
> > +
> > +static void parse_extra_args(struct argv_array *env, const char 
> > *extra_args,
> > +int buflen)
> > +{
> > +   const char *end = extra_args + buflen;
> > +   struct strbuf git_protocol = STRBUF_INIT;
> > +
> > +   for (; extra_args < end; extra_args += strlen(extra_args) + 1) {
> > +   const char *arg = extra_args;
> > +
> > +   /*
> > +* Parse the extra arguments, adding most to 'git_protocol'
> > +* which will be used to set the 'GIT_PROTOCOL' envvar in 
> > the
> > +* service that will be run.
> > +*
> > +* If there ends up being a particular arg in the future 
> > that
> > +* git-daemon needs to parse specificly (like the 'host' 
> > arg)
> > +* then it can be parsed here and not added to 
> > 'git_protocol'.
> > +*/
> > +   if (*arg) {
> > 

Re: [PATCH 2/4] push, fetch: error out for submodule entries not pointing to commits

2017-09-18 Thread Brandon Williams
On 09/12, Jonathan Nieder wrote:
> From: Stefan Beller 
> 
> The check_has_commit helper uses resolves a submodule entry to a
> commit, when validating its existence. As a side effect this means
> tolerates a submodule entry pointing to a tag, which is not a valid
> submodule entry that git commands would know how to cope with.
> 
> Tighten the check to require an actual commit, not a tag pointing to a
> commit.
> 
> Also improve the error handling when a submodule entry points to
> non-commit (e.g., a blob) to error out instead of warning and
> pretending the pointed to object doesn't exist.
> 
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 

Looks good!

> ---
>  submodule.c| 33 +
>  t/t5531-deep-submodule-push.sh | 10 ++
>  2 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/submodule.c b/submodule.c
> index 3cea8221e0..e0da55920d 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -767,19 +767,36 @@ static int append_oid_to_argv(const struct object_id 
> *oid, void *data)
>   return 0;
>  }
>  
> +struct has_commit_data {
> + int result;
> + const char *path;
> +};
> +
>  static int check_has_commit(const struct object_id *oid, void *data)
>  {
> - int *has_commit = data;
> + struct has_commit_data *cb = data;
>  
> - if (!lookup_commit_reference(oid))
> - *has_commit = 0;
> + enum object_type type = sha1_object_info(oid->hash, NULL);
>  
> - return 0;
> + switch (type) {
> + case OBJ_COMMIT:
> + return 0;
> + case OBJ_BAD:
> + /*
> +  * Object is missing or invalid. If invalid, an error message
> +  * has already been printed.
> +  */
> + cb->result = 0;
> + return 0;
> + default:
> + die(_("submodule entry '%s' (%s) is a %s, not a commit"),
> + cb->path, oid_to_hex(oid), typename(type));
> + }
>  }
>  
>  static int submodule_has_commits(const char *path, struct oid_array *commits)
>  {
> - int has_commit = 1;
> + struct has_commit_data has_commit = { 1, path };
>  
>   /*
>* Perform a cheap, but incorrect check for the existence of 'commits'.
> @@ -795,7 +812,7 @@ static int submodule_has_commits(const char *path, struct 
> oid_array *commits)
>  
>   oid_array_for_each_unique(commits, check_has_commit, _commit);
>  
> - if (has_commit) {
> + if (has_commit.result) {
>   /*
>* Even if the submodule is checked out and the commit is
>* present, make sure it exists in the submodule's object store
> @@ -814,12 +831,12 @@ static int submodule_has_commits(const char *path, 
> struct oid_array *commits)
>   cp.dir = path;
>  
>   if (capture_command(, , GIT_MAX_HEXSZ + 1) || out.len)
> - has_commit = 0;
> + has_commit.result = 0;
>  
>   strbuf_release();
>   }
>  
> - return has_commit;
> + return has_commit.result;
>  }
>  
>  static int submodule_needs_pushing(const char *path, struct oid_array 
> *commits)
> diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
> index 0f84a53146..39cb2c1c34 100755
> --- a/t/t5531-deep-submodule-push.sh
> +++ b/t/t5531-deep-submodule-push.sh
> @@ -298,6 +298,16 @@ test_expect_success 'push succeeds if submodule commit 
> disabling recursion from
>   )
>  '
>  
> +test_expect_success 'submodule entry pointing at a tag is error' '
> + git -C work/gar/bage tag -a test1 -m "tag" &&
> + tag=$(git -C work/gar/bage rev-parse test1^{tag}) &&
> + git -C work update-index --cacheinfo 16 "$tag" gar/bage &&
> + git -C work commit -m "bad commit" &&
> + test_when_finished "git -C work reset --hard HEAD^" &&
> + test_must_fail git -C work push --recurse-submodules=on-demand 
> ../pub.git master 2>err &&
> + test_i18ngrep "is a tag, not a commit" err
> +'
> +
>  test_expect_success 'push fails if recurse submodules option passed as yes' '
>   (
>   cd work/gar/bage &&
> -- 
> 2.14.1.690.gbb1197296e
> 

-- 
Brandon Williams


Re: [RFC PATCH v2 1/2] implement fetching of moved submodules

2017-09-18 Thread Brandon Williams
On 09/15, Heiko Voigt wrote:
> We store the changed submodules paths to calculate which submodule needs
> fetching. This does not work for moved submodules since their paths do
> not stay the same in case of a moved submodules. In case of new
> submodules we do not have a path in the current checkout, since they
> just appeared in this fetch.
> 
> It is more general to collect the submodule names for changes instead of
> their paths to include the above cases.
> 
> With the change described above we implement 'on-demand' fetching of
> changes in moved submodules.
> 
> Note: This does only work when repositories have a .gitmodules file. In
> other words: It breaks if we do not get a name for a repository.
> IIRC, consensus was that this is a requirement to get nice submodule
> handling these days?
> 
> NEEDSWORK: This breaks t5531 and t5545 because they do not use a
> .gitmodules file. I will add a fallback to paths to help such users.
> 
> Signed-off-by: Heiko Voigt 
> ---
> This an update of the previous series[1] to the current master. The
> fallback is still missing but now it should not conflict with any topics
> in flight anymore (hopefully).

So the idea is to collect changed submodule's name, instead of their
path, so that if they happened to moved you don't have to worry about
the path changing underneath you.  This should be good once those tests
get fixed.

Thanks for working on cleaning this up! :)

> 
> Cheers Heiko
> 
> [1] https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/
> 
>  submodule.c | 91 
> +
>  t/t5526-fetch-submodules.sh | 35 +
>  2 files changed, 85 insertions(+), 41 deletions(-)
> 
> diff --git a/submodule.c b/submodule.c
> index 3cea8221e0..38b9905e43 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -21,7 +21,7 @@
>  #include "parse-options.h"
>  
>  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
> -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
> +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
>  static int initialized_fetch_ref_tips;
>  static struct oid_array ref_tips_before_fetch;
>  static struct oid_array ref_tips_after_fetch;
> @@ -667,11 +667,11 @@ const struct submodule *submodule_from_ce(const struct 
> cache_entry *ce)
>  }
>  
>  static struct oid_array *submodule_commits(struct string_list *submodules,
> -const char *path)
> +const char *name)
>  {
>   struct string_list_item *item;
>  
> - item = string_list_insert(submodules, path);
> + item = string_list_insert(submodules, name);
>   if (item->util)
>   return (struct oid_array *) item->util;
>  
> @@ -680,39 +680,34 @@ static struct oid_array *submodule_commits(struct 
> string_list *submodules,
>   return (struct oid_array *) item->util;
>  }
>  
> +struct collect_changed_submodules_cb_data {
> + struct string_list *changed;
> + const struct object_id *commit_oid;
> +};
> +
>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
> struct diff_options *options,
> void *data)
>  {
> + struct collect_changed_submodules_cb_data *me = data;
> + struct string_list *changed = me->changed;
> + const struct object_id *commit_oid = me->commit_oid;
>   int i;
> - struct string_list *changed = data;
>  
>   for (i = 0; i < q->nr; i++) {
>   struct diff_filepair *p = q->queue[i];
>   struct oid_array *commits;
> + const struct submodule *submodule;
> +
>   if (!S_ISGITLINK(p->two->mode))
>   continue;
>  
> - if (S_ISGITLINK(p->one->mode)) {
> - /*
> -  * NEEDSWORK: We should honor the name configured in
> -  * the .gitmodules file of the commit we are examining
> -  * here to be able to correctly follow submodules
> -  * being moved around.
> -  */
> - commits = submodule_commits(changed, p->two->path);
> - oid_array_append(commits, >two->oid);
> - } else {
> - /* Submodule is new or was moved here */
> - /*
> -  * NEEDSWORK: When the .git directories of submodules
> -  * live inside the superprojects .git directory some
> -  * day we should fetch new submodules directly into
> -  * that location too when config or options request
> -  * that so they can be checked out from there.
> -  */
> + submodule = submodule_from_path(commit_oid, p->two->path);
> + if 

Re: [RFC PATCH v2 2/2] submodule: simplify decision tree whether to or not to fetch

2017-09-18 Thread Brandon Williams
On 09/15, Heiko Voigt wrote:
> To make extending this logic later easier.
> 
> Signed-off-by: Heiko Voigt 

I like the result of this patch, its much cleaner.

> ---
>  submodule.c | 74 
> ++---
>  1 file changed, 37 insertions(+), 37 deletions(-)
> 
> diff --git a/submodule.c b/submodule.c
> index 38b9905e43..fa44fc59f2 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1118,6 +1118,31 @@ struct submodule_parallel_fetch {
>  };
>  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
>  
> +static int get_fetch_recurse_config(const struct submodule *submodule,
> + struct submodule_parallel_fetch *spf)
> +{
> + if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
> + return spf->command_line_option;
> +
> + if (submodule) {
> + char *key;
> + const char *value;
> +
> + int fetch_recurse = submodule->fetch_recurse;
> + key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
> submodule->name);
> + if (!repo_config_get_string_const(the_repository, key, )) 
> {
> + fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
> value);
> + }
> + free(key);
> +
> + if (fetch_recurse != RECURSE_SUBMODULES_NONE)
> + /* local config overrules everything except commandline 
> */
> + return fetch_recurse;
> + }
> +
> + return spf->default_option;
> +}
> +
>  static int get_next_submodule(struct child_process *cp,
> struct strbuf *err, void *data, void **task_cb)
>  {
> @@ -1137,46 +1162,21 @@ static int get_next_submodule(struct child_process 
> *cp,
>  
>   submodule = submodule_from_path(_oid, ce->name);
>  
> - default_argv = "yes";
> - if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
> - int fetch_recurse = RECURSE_SUBMODULES_NONE;
> -
> - if (submodule) {
> - char *key;
> - const char *value;
> -
> - fetch_recurse = submodule->fetch_recurse;
> - key = 
> xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
> - if 
> (!repo_config_get_string_const(the_repository, key, )) {
> - fetch_recurse = 
> parse_fetch_recurse_submodules_arg(key, value);
> - }
> - free(key);
> - }
> -
> - if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
> - if (fetch_recurse == RECURSE_SUBMODULES_OFF)
> - continue;
> - if (fetch_recurse == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> - if 
> (!unsorted_string_list_lookup(_submodule_names,
> -  
> submodule->name))
> - continue;
> - default_argv = "on-demand";
> - }
> - } else {
> - if (spf->default_option == 
> RECURSE_SUBMODULES_OFF)
> - continue;
> - if (spf->default_option == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> - if 
> (!unsorted_string_list_lookup(_submodule_names,
> -   
> submodule->name))
> - continue;
> - default_argv = "on-demand";
> - }
> - }
> - } else if (spf->command_line_option == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> - if 
> (!unsorted_string_list_lookup(_submodule_names,
> + switch (get_fetch_recurse_config(submodule, spf))
> + {
> + default:
> + case RECURSE_SUBMODULES_DEFAULT:
> + case RECURSE_SUBMODULES_ON_DEMAND:
> + if (!submodule || 
> !unsorted_string_list_lookup(_submodule_names,
>submodule->name))
>   continue;
>   default_argv = "on-demand";
> + break;
> + case RECURSE_SUBMODULES_ON:
> + default_argv = "yes";
> + break;
> + case RECURSE_SUBMODULES_OFF:
> + continue;
>   }
>  
>   strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name);
> -- 
> 2.14.1.145.gb3622a4
> 

-- 
Brandon Williams


Re: [PATCH] protocol: make parse_protocol_version() private

2017-09-18 Thread Brandon Williams
On 09/17, Ramsay Jones wrote:
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Brandon,
> 
> If you need to re-roll your 'bw/protocol-v1' branch, could you please
> squash this into the relevant patch (commit 45954f179e, "protocol:
> introduce protocol extention mechanisms", 13-09-2017).
> 
> This assumes you agree that this symbol does not need to be public; if
> not, then please just ignore! ;-)

Thanks!  I've updated my local version of the series to reflect this.

> 
> Thanks!
> 
> ATB,
> Ramsay Jones
> 
>  protocol.c | 2 +-
>  protocol.h | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/protocol.c b/protocol.c
> index 1b16c7b9a..369503065 100644
> --- a/protocol.c
> +++ b/protocol.c
> @@ -2,7 +2,7 @@
>  #include "config.h"
>  #include "protocol.h"
>  
> -enum protocol_version parse_protocol_version(const char *value)
> +static enum protocol_version parse_protocol_version(const char *value)
>  {
>   if (!strcmp(value, "0"))
>   return protocol_v0;
> diff --git a/protocol.h b/protocol.h
> index 2fa6486d0..18f9a5235 100644
> --- a/protocol.h
> +++ b/protocol.h
> @@ -7,7 +7,6 @@ enum protocol_version {
>   protocol_v1 = 1,
>  };
>  
> -extern enum protocol_version parse_protocol_version(const char *value);
>  extern enum protocol_version get_protocol_version_config(void);
>  extern enum protocol_version determine_protocol_version_server(void);
>  extern enum protocol_version determine_protocol_version_client(const char 
> *server_response);
> -- 
> 2.14.0

-- 
Brandon Williams


RE: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-18 Thread Ben Peart
> -Original Message-
> From: Torsten Bögershausen [mailto:tbo...@web.de]
> Sent: Monday, September 18, 2017 11:43 AM
> To: Ben Peart ; Junio C Hamano
> ; Ben Peart 
> Cc: david.tur...@twosigma.com; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index
> extension
> 
> On 2017-09-18 15:38, Ben Peart wrote:
> >
> >
> > On 9/17/2017 4:02 AM, Junio C Hamano wrote:
> >> Ben Peart  writes:
> >>
> >>> diff --git a/t/helper/test-dump-fsmonitor.c
> >>> b/t/helper/test-dump-fsmonitor.c new file mode 100644 index
> >>> 00..482d749bb9
> >>> --- /dev/null
> >>> +++ b/t/helper/test-dump-fsmonitor.c
> >>> @@ -0,0 +1,21 @@
> >>> +#include "cache.h"
> >>> +
> >>> +int cmd_main(int ac, const char **av) {
> >>> +    struct index_state *istate = _index;
> >>> +    int i;
> >>> +
> >>> +    setup_git_directory();
> >>> +    if (do_read_index(istate, get_index_file(), 0) < 0)
> >>> +    die("unable to read index file");
> >>> +    if (!istate->fsmonitor_last_update) {
> >>> +    printf("no fsmonitor\n");
> >>> +    return 0;
> >>> +    }
> >>> +    printf("fsmonitor last update %"PRIuMAX"\n",
> >>> istate->fsmonitor_last_update);
> >>
> >> After pushing this out and had Travis complain, I queued a squash on
> >> top of this to cast the argument to (uintmax_t), like you did in an
> >> earlier step (I think it was [PATCH 04/12]).
> >>
> >
> > Thanks. I'll update this to cast it as (uint64_t) as that is what
> > get/put_be64 use.  As far as I can tell they both map to the same
> > thing (unsigned long long) so there isn't functional difference.
> (Just to double-check): This is the way to print "PRIuMAX" correctly  (on all
> platforms):
> 
> printf("fsmonitor last update %"PRIuMAX"\n",  (uintmax_t)istate-
> >fsmonitor_last_update);
> 

Should I just make the variable type itself uintmax_t and then just skip
the cast altogether? I went with uint64_t because that is what
getnanotime returned.



[PATCH 2/2] read_info_alternates: warn on non-trivial errors

2017-09-18 Thread Jeff King
When we fail to open $GIT_DIR/info/alternates, we silently
assume there are no alternates. This is the right thing to
do for ENOENT, but not for other errors.

A hard error is probably overkill here. If we fail to read
an alternates file then either we'll complete our operation
anyway, or we'll fail to find some needed object. Either
way, a warning is good idea. And we already have a helper
function to handle this pattern; let's just call
warn_on_fopen_error().

Note that technically the errno from strbuf_read_file()
might be from a read() error, not open(). But since read()
would never return ENOENT or ENOTDIR, and since it produces
a generic "unable to access" error, it's suitable for
handling errors from either.

Signed-off-by: Jeff King 
---
I'm pretty comfortable with the rationale in the final paragraph. But if
we're concerned that warn_on_fopen_errors() might change, we can swap
out strbuf_read_file() for fopen()+strbuf_read().

 sha1_file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sha1_file.c b/sha1_file.c
index b1e4193679..9cec326298 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -432,6 +432,7 @@ static void read_info_alternates(const char * 
relative_base, int depth)
 
path = xstrfmt("%s/info/alternates", relative_base);
if (strbuf_read_file(, path, 1024) < 0) {
+   warn_on_fopen_errors(path);
free(path);
return;
}
-- 
2.14.1.1014.g252e627ae0


[PATCH 1/2] read_info_alternates: read contents into strbuf

2017-09-18 Thread Jeff King
The link_alt_odb_entries() function has always taken a
ptr/len pair as input. Until cf3c635210 (alternates: accept
double-quoted paths, 2016-12-12), we made a copy of those
bytes in a string. But after that commit, we switched to
parsing the input left-to-right, and we ignore "len"
totally, instead reading until we hit a NUL.

This has mostly gone unnoticed for a few reasons:

  1. All but one caller passes a NUL-terminated string, with
 "len" pointing to the NUL.

  2. The remaining caller, read_info_alternates(), passes in
 an mmap'd file. Unless the file is an exact multiple of
 the page size, it will generally be followed by NUL
 padding to the end of the page, which just works.

The easiest way to demonstrate the problem is to build with:

  make SANITIZE=address NO_MMAP=Nope test

Any test which involves $GIT_DIR/info/alternates will fail,
as the mmap emulation (correctly) does not add an extra NUL,
and ASAN complains about reading past the end of the buffer.

One solution would be to teach link_alt_odb_entries() to
respect "len". But it's actually a bit tricky, since we
depend on unquote_c_style() under the hood, and it has no
ptr/len variant.

We could also just make a NUL-terminated copy of the input
bytes and operate on that. But since all but one caller
already is passing a string, instead let's just fix that
caller to provide NUL-terminated input in the first place,
by swapping out mmap for strbuf_read_file().

There's no advantage to using mmap on the alternates file.
It's not expected to be large (and anyway, we're copying its
contents into an in-memory linked list). Nor is using
git_open() buying us anything here, since we don't keep the
descriptor open for a long period of time.

Let's also drop the "len" parameter entirely from
link_alt_odb_entries(), since it's completely ignored. That
will avoid any new callers re-introducing a similar bug.

Reported-by: Michael Haggerty 
Signed-off-by: Jeff King 
---
I didn't reproduce with the mmap even-page-size thing, but I
think it's the same reason we didn't notice the "git log -G"
read-past-mmap issues for a long time. Which makes testing
with ASAN and NO_MMAP an interesting combination, as it
should find out any similar cases (and after this, the whole
test suite passes with that configuration).

 sha1_file.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5f71bbac3e..b1e4193679 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -398,7 +398,7 @@ static const char *parse_alt_odb_entry(const char *string,
return end;
 }
 
-static void link_alt_odb_entries(const char *alt, int len, int sep,
+static void link_alt_odb_entries(const char *alt, int sep,
 const char *relative_base, int depth)
 {
struct strbuf objdirbuf = STRBUF_INIT;
@@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int 
len, int sep,
 
 static void read_info_alternates(const char * relative_base, int depth)
 {
-   char *map;
-   size_t mapsz;
-   struct stat st;
char *path;
-   int fd;
+   struct strbuf buf = STRBUF_INIT;
 
path = xstrfmt("%s/info/alternates", relative_base);
-   fd = git_open(path);
-   free(path);
-   if (fd < 0)
-   return;
-   if (fstat(fd, ) || (st.st_size == 0)) {
-   close(fd);
+   if (strbuf_read_file(, path, 1024) < 0) {
+   free(path);
return;
}
-   mapsz = xsize_t(st.st_size);
-   map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0);
-   close(fd);
 
-   link_alt_odb_entries(map, mapsz, '\n', relative_base, depth);
-
-   munmap(map, mapsz);
+   link_alt_odb_entries(buf.buf, '\n', relative_base, depth);
+   strbuf_release();
 }
 
 struct alternate_object_database *alloc_alt_odb(const char *dir)
@@ -503,7 +492,7 @@ void add_to_alternates_file(const char *reference)
if (commit_lock_file(lock))
die_errno("unable to move new alternates file into 
place");
if (alt_odb_tail)
-   link_alt_odb_entries(reference, strlen(reference), 
'\n', NULL, 0);
+   link_alt_odb_entries(reference, '\n', NULL, 0);
}
free(alts);
 }
@@ -516,7 +505,7 @@ void add_to_alternates_memory(const char *reference)
 */
prepare_alt_odb();
 
-   link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
+   link_alt_odb_entries(reference, '\n', NULL, 0);
 }
 
 /*
@@ -619,7 +608,7 @@ void prepare_alt_odb(void)
if (!alt) alt = "";
 
alt_odb_tail = _odb_list;
-   link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);
+   link_alt_odb_entries(alt, PATH_SEP, NULL, 0);
 
read_info_alternates(get_object_directory(), 0);
 }
-- 
2.14.1.1014.g252e627ae0



[PATCH 0/2] fix read past end of array in alternates files

2017-09-18 Thread Jeff King
This series fixes a regression in v2.11.1 where we might read past the
end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't
base the patch on there, for a few reasons:

  1. There's a trivial conflict when merging up (because of
 git_open_noatime() becoming just git_open() in the inerim).

  2. The reproduction advice relies on our SANITIZE Makefile knob, which
 didn't exist back then.

  3. The second patch does not apply there because we don't have
 warn_on_fopen_errors(). Though admittedly it could be applied
 separately after merging up; it's just a clean-up on top.

It does apply on the current "maint".

  [1/2]: read_info_alternates: read contents into strbuf
  [2/2]: read_info_alternates: warn on non-trivial errors

 sha1_file.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

-Peff


Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-18 Thread Torsten Bögershausen
On 2017-09-18 15:38, Ben Peart wrote:
> 
> 
> On 9/17/2017 4:02 AM, Junio C Hamano wrote:
>> Ben Peart  writes:
>>
>>> diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
>>> new file mode 100644
>>> index 00..482d749bb9
>>> --- /dev/null
>>> +++ b/t/helper/test-dump-fsmonitor.c
>>> @@ -0,0 +1,21 @@
>>> +#include "cache.h"
>>> +
>>> +int cmd_main(int ac, const char **av)
>>> +{
>>> +    struct index_state *istate = _index;
>>> +    int i;
>>> +
>>> +    setup_git_directory();
>>> +    if (do_read_index(istate, get_index_file(), 0) < 0)
>>> +    die("unable to read index file");
>>> +    if (!istate->fsmonitor_last_update) {
>>> +    printf("no fsmonitor\n");
>>> +    return 0;
>>> +    }
>>> +    printf("fsmonitor last update %"PRIuMAX"\n",
>>> istate->fsmonitor_last_update);
>>
>> After pushing this out and had Travis complain, I queued a squash on
>> top of this to cast the argument to (uintmax_t), like you did in an
>> earlier step (I think it was [PATCH 04/12]).
>>
> 
> Thanks. I'll update this to cast it as (uint64_t) as that is what get/put_be64
> use.  As far as I can tell they both map to the same thing (unsigned long 
> long)
> so there isn't functional difference.
(Just to double-check): This is the way to print "PRIuMAX" correctly
 (on all platforms):

printf("fsmonitor last update %"PRIuMAX"\n",
 (uintmax_t)istate->fsmonitor_last_update);




Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-18 Thread Ben Peart



On 9/17/2017 12:47 AM, Junio C Hamano wrote:

Ben Peart  writes:


+write_integration_script() {
+   write_script .git/hooks/fsmonitor-test<<-\EOF
+   if [ "$#" -ne 2 ]; then
+   echo "$0: exactly 2 arguments expected"
+   exit 2
+   fi
+   if [ "$1" != 1 ]; then
+   echo -e "Unsupported core.fsmonitor hook version.\n" >&2
+   exit 1
+   fi


In addition to "echo -e" thing pointed out earlier, these look
somewhat unusual in our shell scripts, relative to what
Documentation/CodingGuidelines tells us to do:


I'm happy to make these changes.  I understand the difficulty of 
creating a consistent coding style especially after the fact.




Copy/paste is usually a developers best friend as it allows you to avoid 
a lot of errors by reusing existing tested code.  One of the times it 
backfires is when that code doesn't match the current desired coding style.


I only point these out to help lend some additional impetus to the 
effort to formalize the coding style and to provide tooling to handle 
what should mostly be a mechanical process. IMO, the goal should be to 
save the maintainer and contributors the cost of having to write up and 
respond to formatting feedback. :)


Some stats on these same coding style errors in the current bash scripts:

298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space)
140 instances of "if \[ .* \]" (not using the preferred "test")
293 instances of "if .*; then"

Wouldn't it be great not to have to write up style feedback for when 
these all get copy/pasted into new scripts? :)






  - We prefer a space between the function name and the parentheses,
and no space inside the parentheses. The opening "{" should also
be on the same line.

(incorrect)
my_function(){
...

(correct)
my_function () {
...

  - We prefer "test" over "[ ... ]".

  - Do not write control structures on a single line with semicolon.
"then" should be on the next line for if statements, and "do"
should be on the next line for "while" and "for".

(incorrect)
if test -f hello; then
do this
fi

(correct)
if test -f hello
then
do this
fi


diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
new file mode 100755
index 00..aaee5d1fe3
--- /dev/null
+++ b/t/t7519/fsmonitor-watchman
@@ -0,0 +1,128 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use IPC::Open2;
+ ...
+   open (my $fh, ">", ".git/watchman-query.json");
+   print $fh "[\"query\", \"$git_work_tree\", { \
+   \"since\": $time, \
+   \"fields\": [\"name\"], \
+   \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", 
\"exists\"]]] \
+   }]";
+   close $fh;
+
+   print CHLD_IN "[\"query\", \"$git_work_tree\", { \
+   \"since\": $time, \
+   \"fields\": [\"name\"], \
+   \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", 
\"exists\"]]] \
+   }]";


This look painful to read, write and maintain.  IIRC, Perl supports
the <

I agree!  I'm definitely *not* a perl developer so was unaware of this 
construct.  A few minutes with stack overflow and now I can clean this up.



+}
\ No newline at end of file


Oops.

Thanks.



Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-18 Thread Johannes Schindelin
Hi Junio,

On Sat, 16 Sep 2017, Junio C Hamano wrote:

> And as you alluded to, we may need to see if we can help making it
> easier to do the latter when needed.

That mistakes it for "Someone Else's Problem".

> Johannes Schindelin  writes:
> 
> > That's what you are buying into by having these "What's cooking" mails
> > that are in no usable way connected to the original threads.
> 
> For the above reason, I do not think this is a particularly useful
> stance to take.

I agree, but this is the process you chose to use.

> Do you have a concrete suggestion to make these individual entries more
> helpful for people who may want go back to the original thread in doing
> so?  In-reply-to: or References: fields of the "What's cooking" report
> would not help.  I often have the message IDs that made/helped me make
> these individual comments in the description; they alone would not react
> to mouse clicks, though.

Oh gawd, not even more stuff piled onto the mail format. Please stop.

> Having said that, I'd expect that individual contributors have past
> messages pertaining to the smaller numbers of their own topics in flight
> in their mailbox than the project wide "What's cooking" report covers,
> so perhaps an affort to devise such a mechanism may result in an
> over-engineering waste nobody finds useful.  I dunno.

I frequently find myself putting off that What's cooking mail because it
simply takes too long to study.

It probably tries to serve too many purposes at the same time, and thereby
none.

In the same vein as "to a hammer, everything looks like a nail": when it
comes to project management, a dedicated tool will always beat a
general-purpose mailing list.

Ciao,
Dscho


Re: [PATCH v6 12/12] fsmonitor: add a performance test

2017-09-18 Thread Johannes Schindelin
Hi Ben,

sorry for not catching this earlier:

On Fri, 15 Sep 2017, Ben Peart wrote:

> [...]
> +
> +int cmd_dropcaches(void)
> +{
> + HANDLE hProcess = GetCurrentProcess();
> + HANDLE hToken;
> + int status;
> +
> + if (!OpenProcessToken(hProcess, TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES, 
> ))
> + return error("Can't open current process token");
> +
> + if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1))
> + return error("Can't get SeProfileSingleProcessPrivilege");
> +
> + CloseHandle(hToken);
> +
> + HMODULE ntdll = LoadLibrary("ntdll.dll");

Git's source code still tries to abide by C90, and for simplicity's sake,
this extends to the Windows-specific part. Therefore, the `ntdll` variable
needs to be declared at the beginning of the function (I do agree that it
makes for better code to reduce the scope of variables, but C90 simply did
not allow variables to be declared in the middle of functions).

I wanted to send a patch address this in the obvious way, but then I
encountered these lines:

> + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) =
> + (DWORD(WINAPI *)(INT, PVOID, ULONG))GetProcAddress(ntdll, 
> "NtSetSystemInformation");
> + if (!NtSetSystemInformation)
> + return error("Can't get function addresses, wrong Windows 
> version?");

It turns out that we have seen this plenty of times in Git for Windows'
fork, so much so that we came up with a nice helper to make this all a bit
more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and
INIT_PROC_ADDR helpers in compat/win32/lazyload.h.

Maybe this would be the perfect excuse to integrate this patch into
upstream Git? This would be the patch (you can also cherry-pick it from
25c4dc3a73352e72e995594cf1b4afa46e93d040 in https://github.com/dscho/git):

-- snip --
>From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin 
Date: Tue, 10 Jan 2017 23:14:20 +0100
Subject: [PATCH] Win32: simplify loading of DLL functions

Dynamic loading of DLL functions is duplicated in several places in Git
for Windows' source code.

This patch adds a pair of macros to simplify the process: the
DECLARE_PROC_ADDR(, , ,
..) macro to be used at the beginning of a
code block, and the INIT_PROC_ADDR() macro to call before
using the declared function. The return value of the INIT_PROC_ADDR()
call has to be checked; If it is NULL, the function was not found in the
specified DLL.

Example:

DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
  LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);

if (!INIT_PROC_ADDR(CreateHardLinkW))
return error("Could not find CreateHardLinkW() function";

if (!CreateHardLinkW(source, target, NULL))
return error("could not create hardlink from %S to %S",
 source, target);
return 0;

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---
 compat/win32/lazyload.h | 44 
 1 file changed, 44 insertions(+)
 create mode 100644 compat/win32/lazyload.h

diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h
new file mode 100644
index 000..91c10dad2fb
--- /dev/null
+++ b/compat/win32/lazyload.h
@@ -0,0 +1,44 @@
+#ifndef LAZYLOAD_H
+#define LAZYLOAD_H
+
+/* simplify loading of DLL functions */
+
+struct proc_addr {
+   const char *const dll;
+   const char *const function;
+   FARPROC pfunction;
+   unsigned initialized : 1;
+};
+
+/* Declares a function to be loaded dynamically from a DLL. */
+#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \
+   static struct proc_addr proc_addr_##function = \
+   { #dll, #function, NULL, 0 }; \
+   static rettype (WINAPI *function)(__VA_ARGS__)
+
+/*
+ * Loads a function from a DLL (once-only).
+ * Returns non-NULL function pointer on success.
+ * Returns NULL + errno == ENOSYS on failure.
+ */
+#define INIT_PROC_ADDR(function) \
+   (function = get_proc_addr(_addr_##function))
+
+static inline void *get_proc_addr(struct proc_addr *proc)
+{
+   /* only do this once */
+   if (!proc->initialized) {
+   HANDLE hnd;
+   proc->initialized = 1;
+   hnd = LoadLibraryExA(proc->dll, NULL,
+LOAD_LIBRARY_SEARCH_SYSTEM32);
+   if (hnd)
+   proc->pfunction = GetProcAddress(hnd, proc->function);
+   }
+   /* set ENOSYS if DLL or function was not found */
+   if (!proc->pfunction)
+   errno = ENOSYS;
+   return proc->pfunction;
+}
+
+#endif
-- snap --

With this patch, this fixup to your patch would make things compile (you
can also cherry-pick d05996fb61027512b8ab31a36c4a7a677dea11bb from my
fork):

-- snipsnap --
>From 

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-18 Thread Santiago Torres
Hello, everyone.

Sorry for being late in this thread, I was making sure I didn't say
anything outrageously wrong. 

> That's Stefan; I wouldn't have suggested any approach that uses the
> blob whose sole purpose is to serve as a temporary storage area to
> pass the information to the hook as part of the permanent record.
> 

Hmm, as far as I understand *this* is the status quo. We get an
ephemeral sha1/oid only if you have a hook attached. Otherwise, you will
never see the object at all, even if you have --signed set.

I suggested preparing this RFC/Patch because of the following reasons
(please let me know if my understanding of any of this is wrong...):

- I find it weird that the cli allows for a --signed push and
  nowhere in the receive-pack's feedback you're told if the push
  certificate is compute/stored/handled at all. I think that, at the
  latest, receive pack should let the user know whether the signed
  push succeeded, or if there is no hook attached to handle it.

- *if there is a hook* the blob is computed, but it is up to the
  hook itself to store it *somewhere*. This makes me feel like it's
  somewhat of a useless waste of computation if the hook is not
  meant to handle it anyway(which is just a post-receive hook). I
  find it rather weird that --signed is a builtin flag, and is
  handled on the server side only partially (just my two cents).

- To me, the way push certificates are handled now feels like having
  git commit -S producing a detached signature that you have to
  embed somehow in the resulting commit object. (As a matter of
  fact, many points on [1] seem to back this notion, and even recall
  some drawbacks on push certificates the way they are handled
  today)

I understand the concurrency concerns, so I agree with stefan's
solution, although I don't know how big of a deal it would be, if git
already supports --atomic pushes (admittedly, I haven't checked if there
are any guards in place for someone who pushes millions of refs
atomically). It'd be completely understandable to have a setting to
disable hadnling of --signed pushes and, ideally, a way to warn the user
of this feature being disabled if --signed is sent.

Maybe I'm missing the bigger picture, but to me it feels that a named
ref to the latest push certificate would make it easier to
handle/tool/sync around the push certificate solution?

Thanks,
-Santiago.

[1] 
https://public-inbox.org/git/CAJo=hJvWbjEM9E5AjPHgmQ=eY8xf=Q=xtukeu2ur7auuqea...@mail.gmail.com/


signature.asc
Description: PGP signature


Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-18 Thread Ben Peart



On 9/16/2017 11:27 AM, Torsten Bögershausen wrote:

On 2017-09-15 21:20, Ben Peart wrote:

+if [ "$1" != 1 ]
+then
+   echo -e "Unsupported core.fsmonitor hook version.\n" >&2
+   exit 1
+fi


The echo -e not portable
(It was detected by a tighter version of the lint script,
  which I have here, but not yet send to the list :-(

This will do:
echo  "Unsupported core.fsmonitor hook version." >&2



Thanks, I'll fix these and the ones in the t/t7519 directory.


Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread Ben Peart



On 9/18/2017 9:32 AM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:peart...@gmail.com]
Sent: Monday, September 18, 2017 9:07 AM
To: David Turner ; 'Ben Peart'

Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
p...@peff.net
Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file
system monitor to speed up detecting new or changed files.

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
a file system monitor to speed up detecting new or changed files.



+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}


This functions return values are backwards relative to the rest of the

git_config_* functions.

I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
configured, it returns 0. I don't make use of the -1 /* default value */ option
as I didn't see any use/value in this case. What is backwards?


The other git_config_* functions return 1 for error and 0 for success.


Hmm, I followed the model (ie copy/paste :)) used by the tracked cache. 
If you look at how it uses, the return value, it is 0 == false == remove 
the extension, 1 == true == add the extension.  I'm doing the same with 
fsmonitor.


static void tweak_untracked_cache(struct index_state *istate)
{
switch (git_config_get_untracked_cache()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_untracked_cache(istate);
break;
case 1: /* true */
add_untracked_cache(istate);
break;
default: /* unknown value: do nothing */
break;
}
}

void tweak_fsmonitor(struct index_state *istate)
{
switch (git_config_get_fsmonitor()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_fsmonitor(istate);
break;
case 1: /* true */
add_fsmonitor(istate);
break;
default: /* unknown value: do nothing */
break;
}
}




[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/



Did you intend to make a comment here?


Sorry.  I was going to make a comment that I didn't see how that could work
since we weren't touching the untracked cache here, but then I saw the bit
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.
  



The logic here assumes that when the index is written out, it is valid 
including the untracked cache and the CE_FSMONITOR_VALID bits. 
Therefore it should still all be valid as of the time the fsmonitor was 
queried and the index saved.


Another way of thinking about this is that the fsmonitor extension is 
simply adding another persisted bit on each index entry.  It just gets 
persisted/restored differently than the other persisted bits.


Obviously, before we can use it assuming it reflects the *current* state 
of the working directory, we have to refresh the bits via the refresh logic.


Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-18 Thread Ben Peart



On 9/17/2017 4:02 AM, Junio C Hamano wrote:

Ben Peart  writes:


diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
new file mode 100644
index 00..482d749bb9
--- /dev/null
+++ b/t/helper/test-dump-fsmonitor.c
@@ -0,0 +1,21 @@
+#include "cache.h"
+
+int cmd_main(int ac, const char **av)
+{
+   struct index_state *istate = _index;
+   int i;
+
+   setup_git_directory();
+   if (do_read_index(istate, get_index_file(), 0) < 0)
+   die("unable to read index file");
+   if (!istate->fsmonitor_last_update) {
+   printf("no fsmonitor\n");
+   return 0;
+   }
+   printf("fsmonitor last update %"PRIuMAX"\n", 
istate->fsmonitor_last_update);


After pushing this out and had Travis complain, I queued a squash on
top of this to cast the argument to (uintmax_t), like you did in an
earlier step (I think it was [PATCH 04/12]).



Thanks. I'll update this to cast it as (uint64_t) as that is what 
get/put_be64 use.  As far as I can tell they both map to the same thing 
(unsigned long long) so there isn't functional difference.


Re: [PATCH] t2018: remove unwanted empty line

2017-09-18 Thread Kaartic Sivaraam

On Monday 18 September 2017 12:52 AM, Kevin Daudt wrote:

Why is this empty line unwanted? This kind of whitespace can help
separate logical sections, just like paragraphs would.

You're right. I seem to have sent a fix precariously because I haven't 
such separation
before (forgetting the fact that git has contributors whose way of 
writing test vary diversly).
Should better stop back and think the next time rather than spamming the 
list. Sorry, for this.


---
Kaartic


RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread David Turner
> -Original Message-
> From: Ben Peart [mailto:peart...@gmail.com]
> Sent: Monday, September 18, 2017 9:07 AM
> To: David Turner ; 'Ben Peart'
> 
> Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> p...@peff.net
> Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a 
> file
> system monitor to speed up detecting new or changed files.
> 
> Thanks for taking the time to review/provide feedback!
> 
> On 9/15/2017 5:35 PM, David Turner wrote:
> >> -Original Message-
> >> From: Ben Peart [mailto:benpe...@microsoft.com]
> >> Sent: Friday, September 15, 2017 3:21 PM
> >> To: benpe...@microsoft.com
> >> Cc: David Turner ; ava...@gmail.com;
> >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> >> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
> >> a file system monitor to speed up detecting new or changed files.
> >
> >> +int git_config_get_fsmonitor(void)
> >> +{
> >> +  if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
> >> +  core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> >> +
> >> +  if (core_fsmonitor && !*core_fsmonitor)
> >> +  core_fsmonitor = NULL;
> >> +
> >> +  if (core_fsmonitor)
> >> +  return 1;
> >> +
> >> +  return 0;
> >> +}
> >
> > This functions return values are backwards relative to the rest of the
> git_config_* functions.
> 
> I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
> configured, it returns 0. I don't make use of the -1 /* default value */ 
> option
> as I didn't see any use/value in this case. What is backwards?

The other git_config_* functions return 1 for error and 0 for success.

> > [snip]
> >
> > +>  /*
> > +>   * With fsmonitor, we can trust the untracked cache's valid field.
> > +>   */
> >
> 
> Did you intend to make a comment here?

Sorry.  I was going to make a comment that I didn't see how that could work 
since we weren't touching the untracked cache here, but then I saw the bit 
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.
 



Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.

2017-09-18 Thread Ben Peart



On 9/17/2017 4:03 AM, Junio C Hamano wrote:

Ben Peart  writes:


+[[fsmonitor-watchman]]
+fsmonitor-watchman
+~~~


I've queued a mini squash on top to make sure the  line aligns
with the length of the string above it by adding three ~'s here.



Thanks, I'll do the same assuming there will be another re-roll.


Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.

2017-09-18 Thread Ben Peart



On 9/15/2017 3:43 PM, David Turner wrote:




-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor
extension.

This includes the core.fsmonitor setting, the query-fsmonitor hook, and the
fsmonitor index extension.

Signed-off-by: Ben Peart 
---
  Documentation/config.txt |  6 ++
  Documentation/githooks.txt   | 23 +++
  Documentation/technical/index-format.txt | 19 +++
  3 files changed, 48 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt index
dc4e3f58a2..c196007a27 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -413,6 +413,12 @@ core.protectNTFS::
8.3 "short" names.
Defaults to `true` on Windows, and `false` elsewhere.

+core.fsmonitor::
+   If set, the value of this variable is used as a command which
+   will identify all files that may have changed since the
+   requested date/time. This information is used to speed up git by
+   avoiding unnecessary processing of files that have not changed.


I'm confused here.  You have a file called "fsmonitor-watchman", which seems to 
discuss the protocol for core.fsmonitor scripts in general, and you have this 
documentation, which does not link to that file.  Can you clarify this?


I'll add the missing link to the documentation in githooks.txt.  The 
documentation should be enough for someone to develop another 
integration script.


The fsmonitor-watchman script allows people to easily use this patch 
series with the existing Watchman monitor but it can certainly also be 
used as a sample for how to integrate with another file system monitor.







+The hook should output to stdout the list of all files in the working
+directory that may have changed since the requested time.  The logic
+should be inclusive so that it does not miss any potential changes.


+"It is OK to include files which have not actually changed.  Newly-created and 
deleted files should also be included.  When files are renamed, both the old and the new 
name should be included."

Also, please discuss case sensitivity issues (e.g. on OS X).


+The paths should be relative to the root of the working directory and
+be separated by a single NUL.





+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+   time which is stored as the nanoseconds elapsed since midnight,
+   January 1, 1970.


Nit: Please specify signed or unsigned for these.  (I expect to be getting out 
of
cryosleep around 2262, and I want to know if my old git repos will keep 
working...)



While I'm not opposed to specifying unsigned, I did notice that the only 
place signed/unsigned is specified today is in "index entry." Everywhere 
else doesn't specify so I left it off for consistency.  I've not seen 
negative version numbers nor negative time so am not entirely sure it is 
necessary to clarify. :)



+  - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+is not CE_FSMONITOR_VALID.




Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread Ben Peart

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file 
system
monitor to speed up detecting new or changed files.
  

+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}


This functions return values are backwards relative to the rest of the 
git_config_* functions.


I'm confused.  If core.fsmonitor is configured, it returns 1. If it is 
not configured, it returns 0. I don't make use of the -1 /* default 
value */ option as I didn't see any use/value in this case. What is 
backwards?




[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/



Did you intend to make a comment here?


[snip]


+int read_fsmonitor_extension(struct index_state *istate, const void *data,
+   unsigned long sz)
+{


If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak.



Good catch!  Thank you.


[snip]


+   /* a fsmonitor process can return '*' to indicate all entries are 
invalid */


That's not documented in your documentation.  Also, I'm not sure I like it: what
if I have a file whose name starts with '*'?  Yeah, that would be silly, but 
this indicates the need
for the protocol to have some sort of signaling mechanism that's out-of-band  
Maybe
have some key\0value\0 pairs and then \0\0 and then the list of files?  Or, if 
you want to keep
it really simple, allow an entry of '/' (which is an invalid filename) to mean 
'all'.



Yea, this was an optimization I added late in the game to get around an 
issue in Watchman where it returns the name of every file the first time 
you query it (rather than the set of files that have actually changed 
since the requested time).


I didn't realize the wild card '*' was a valid character for a filename. 
 I'll switch to '/' as you suggest as I don't want to complicate things 
unnecessarily to handle this relatively rare optimization.  I'll also 
get it documented properly.  Thanks!



+void add_fsmonitor(struct index_state *istate) {
+   int i;
+
+   if (!istate->fsmonitor_last_update) {

[snip]

+   /* reset the untracked cache */


Is this really necessary?  Shouldn't the untracked cache be in a correct state 
already?



When fsmonitor is not turned on, I'm not explicitly invalidating 
untracked cache directory entries as git makes changes to files. While I 
doubt the sequence happens of 1) git making changes to files, *then* 2) 
turning on fsmonitor - I thought it better safe than sorry to assume 
that pattern won't ever happen in the future.  Especially since turning 
on the extension is rare and the cost is low.



+/*
+ * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate


Nit: "s/entries/entry's/".
  



Re: [PATCH] test-drop-caches: mark file local symbols with static

2017-09-18 Thread Ben Peart



On 9/17/2017 12:14 PM, Ramsay Jones wrote:


Signed-off-by: Ramsay Jones 
---

Hi Ben,

If you need to re-roll your 'bp/fsmonitor' branch, could you please
squash this into the relevant patch (commit c6b5a28941, "fsmonitor:
add a performance test", 15-09-2017).


Absolutely.  Thanks!

I'd really appreciate some feedback on whether this works properly on 
platform other than Windows.




Thanks!

ATB,
Ramsay Jones

  t/helper/test-drop-caches.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
index 717079865..17590170f 100644
--- a/t/helper/test-drop-caches.c
+++ b/t/helper/test-drop-caches.c
@@ -2,7 +2,7 @@
  
  #if defined(GIT_WINDOWS_NATIVE)
  
-int cmd_sync(void)

+static int cmd_sync(void)
  {
char Buffer[MAX_PATH];
DWORD dwRet;
@@ -49,7 +49,7 @@ typedef enum _SYSTEM_MEMORY_LIST_COMMAND {
MemoryCommandMax
  } SYSTEM_MEMORY_LIST_COMMAND;
  
-BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags)

+static BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags)
  {
BOOL bResult;
DWORD dwBufferLength;
@@ -77,7 +77,7 @@ BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int 
flags)
return bResult;
  }
  
-int cmd_dropcaches(void)

+static int cmd_dropcaches(void)
  {
HANDLE hProcess = GetCurrentProcess();
HANDLE hToken;
@@ -118,36 +118,36 @@ int cmd_dropcaches(void)
  
  #elif defined(__linux__)
  
-int cmd_sync(void)

+static int cmd_sync(void)
  {
return system("sync");
  }
  
-int cmd_dropcaches(void)

+static int cmd_dropcaches(void)
  {
return system("echo 3 | sudo tee /proc/sys/vm/drop_caches");
  }
  
  #elif defined(__APPLE__)
  
-int cmd_sync(void)

+static int cmd_sync(void)
  {
return system("sync");
  }
  
-int cmd_dropcaches(void)

+static int cmd_dropcaches(void)
  {
return system("sudo purge");
  }
  
  #else
  
-int cmd_sync(void)

+static int cmd_sync(void)
  {
return 0;
  }
  
-int cmd_dropcaches(void)

+static int cmd_dropcaches(void)
  {
return error("drop caches not implemented on this platform");
  }



Re: RFC v3: Another proposed hash function transition plan

2017-09-18 Thread Gilles Van Assche
Hi Johannes,

> SHA-256 got much more cryptanalysis than SHA3-256 […].

I do not think this is true. Keccak/SHA-3 actually got (and is still
getting) a lot of cryptanalysis, with papers published at renowned
crypto conferences [1].

Keccak/SHA-3 is recognized to have a significant safety margin. E.g.,
one can cut the number of rounds in half (as in Keyak or KangarooTwelve)
and still get a very strong function. I don't think we could say the
same for SHA-256 or SHA-512…

Kind regards,
Gilles, for the Keccak team

[1] https://keccak.team/third_party.html



Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()

2017-09-18 Thread Derrick Stolee

On 9/17/2017 5:51 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


+int cmd_main(int ac, const char **av)
+{
+   setup_git_directory();


As far as I recall, we do not (yet) allow declaration after
statement in our codebase.  Move this down to make it after all
decls.


Will fix.


+
+   unsigned int hash_delt = 0x13579BDF;
+   unsigned int hash_base = 0x01020304;
+   struct object_id oid;
+
+   int i, count = 0;
+   int n = sizeof(struct object_id) / sizeof(int);


It probably is technically OK to assume sizeof(int) always equals to
sizeof(unsigned), but because you use 'n' _only_ to work with uint
and never with int, it would make more sense to match.


Will fix. I also notice that "n" should be const.


But I do not think we want this "clever" optimization that involves
'n' in the first place.

>>> +  while (count++ < 10) {

+   for (i = 0; i < n; i++)
+   ((unsigned int*)oid.hash)[i] = hash_base;


Does it make sense to assume that uint is always 4-byte (so this
code won't work if it is 8-byte on your platform) and doing this is
faster than using platform-optimized memcpy()?


I'm not sure what you mean by using memcpy to improve this, because
it would require calling memcpy in the inner loop, such as

for (i = 0; i < n; i++)
memcpy(oid.hash + i * sizeof(unsigned), _base,
   sizeof(unsigned));

I'm probably misunderstanding your intended use of memcpy().


+   find_unique_abbrev(oid.hash, MINIMUM_ABBREV);
+
+   hash_base += hash_delt;
+   }


I also wonder if this is measuring the right thing.  I am guessing
that by making many queries for a unique abbreviation of "random"
(well, it is deterministic, but my point is these queries are not
based on the names of objects that exist in the repository) hashes,
this test measures how much time it takes for us to decide that such
a random hash does not exist.  In the real life use, we make the
opposite query far more frequently: we have an object that we _know_
exists in the repository and we try to find a sufficient length to
disambiguate it from others, and I suspect that these two use
different logic.  Don't you need to be measuring the time it takes
to compute the shortest abbreviation of an object that exists
instead?


First, this doesn't just measure the time it takes to determine non-
existence, because it finds the len required to disambiguate an
"incoming" hash from all known hashes. When testing, I put in a
simple printf to report the result abbreviation so I could see how
often it needed to be extended. In this sense, the test exposes the
while loop that is removed by PATCH 2/3.

Second, your criticism about extant hashes is valid, and one I
struggled to reconcile. I see two issues with testing known hashes:

1. By determining the hash exists, we have inspected the file that
   contains it (either a .idx or the loose-object directory). This
   has side-effects that warm up the file cache so the looped method
   is artificially faster to find matching objects. The effect is
   particularly significant on a repo with exactly one packfile.

2. If we iterate over the entire set of objects, this test takes
   O(N*t(N)) time, where t(N) is the average time to compute a
   minimum abbreviation. For large repos, this can take several
   minutes. Instead, even with the constant 100,000 test hashes, we
   have an O(t(N)) test. We could avoid the asymptotic growth by
   limiting the number of existing hashes we use, but how do we
   find a sufficiently uniform sample from them?

By looking at some other perf tests, I see that we can add a pre-
requisite action. I will investigate making another helper that
uniformly selects a set of hashes from the repo and writes them
to stdout in a random order. p0008-abbrev.sh will run the helper as
a prerequisite, saving the data to a file. test-abbrev will read
the hashes from stdin to test find_unique_abbrev. This should avoid
the side-effects I mentioned (test-abbrev will not warm up indexes)
while also testing abbreviation lengths for existing objects.

I'll get started on these changes and send a new patch with new perf
data in a couple days. Please let me know if there is a better way
to measure performance for this change.

Thanks,
-Stolee



BUSINESS PROPOSAL

2017-09-18 Thread LING LUNG
Please I   like you to keep this proposal as a top secret and delete it if you 
are not interested and get back to 
me if you are interested for  details as regards to the transfer of $24,500,000 
to you. This money initially 
belongs to a Libyan client who died in the libya crisis  and had no next of kin 
in his account-opening 
package in my bank here in Hong kong where I am a bank director. In other to 
achieve this, I shall 
require your full name, and telephone number to reach you.Most importantly,  a 
confirmation of 
acceptance from you will be needed after which I shall furnish you with the 
full details of this transaction.

Kindly reply to linglung0...@gmail.com

Respectfully,

Ling Lung