Re: RFC: Design and code of partial clones (now, missing commits and trees OK)
Jonathan Tanwrites: > For those interested in partial clones and/or missing objects in repos, > I've updated my original partialclone patches to not require an explicit > list of promises. Fetch/clone still only permits exclusion of blobs, but > the infrastructure is there for a local repo to support missing trees > and commits as well. > ... > Demo > > > Obtain a repository. > > $ make prefix=$HOME/local install > $ cd $HOME/tmp > $ git clone https://github.com/git/git > > Make it advertise the new feature and allow requests for arbitrary blobs. > > $ git -C git config uploadpack.advertiseblobmaxbytes 1 > $ git -C git config uploadpack.allowanysha1inwant 1 > > Perform the partial clone and check that it is indeed smaller. Specify > "file://" in order to test the partial clone mechanism. (If not, Git will > perform a local clone, which unselectively copies every object.) > > $ git clone --blob-max-bytes=10 "file://$(pwd)/git" git2 > $ git clone "file://$(pwd)/git" git3 > $ du -sh git2 git3 > 116M git2 > 129M git3 > > Observe that the new repo is automatically configured to fetch missing objects > from the original repo. Subsequent fetches will also be partial. > > $ cat git2/.git/config > [core] > repositoryformatversion = 1 > filemode = true > bare = false > logallrefupdates = true > [remote "origin"] > url = [snip] > fetch = +refs/heads/*:refs/remotes/origin/* > blobmaxbytes = 10 > [extensions] > partialclone = origin > [branch "master"] > remote = origin > merge = refs/heads/master The above sequence of events make quite a lot of sense. And the following description of how it is designed (snipped) is clear enough (at least to me) to allow me to say that I quite like it.
Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer
On Mon, Sep 18, 2017 at 04:17:24PM -0700, Jonathan Nieder wrote: > phionah bugosi wrote: > > > Just to reecho a previous change requested before in one of the mail > > threads, we currently have two global variables declared: > > > > struct mru packed_git_mru_storage; > > struct mru *packed_git_mru = _git_mru_storage; > > > > We normally use pointers in C to point or refer to the same location > > or space in memory from multiple places. That means in situations > > where we will have the pointer be assigned to in many places in our > > code. But in this case, we do not have any other logic refering or > > assigning to the pointer packed_git_mru. In simple words we actually > > dont need a pointer in this case. > > > > Therefore this patch makes packed_git_mru global a value instead of > > a pointer and makes use of list.h > > > > Signed-off-by: phionah bugosi> > --- > > builtin/pack-objects.c | 5 +++-- > > cache.h| 7 --- > > list.h | 6 ++ > > packfile.c | 12 ++-- > > 4 files changed, 15 insertions(+), 15 deletions(-) > > *puzzled* This appears to already be in "pu", with a different author. > Did you independently make the same change? Or are you asking for a > progress report on that change, and just phrasing it in a confusing > way? I pointed Phionah at your #leftoverbits comment in: https://public-inbox.org/git/20170912172839.gb144...@aiede.mtv.corp.google.com/ about moving packed_git_mru to list.h. But I'm afraid I wasn't very clear in giving further guidance. The goal is to build on _top_ of the patch in that message, and convert the doubly-linked list implementation used in "struct mru" to use the shared code in list.h. That should mostly involve touching mru.c and mru.h, though I think we'd need to tweak the name of the "next" pointer during the traversal, too. -Peff
Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors
On Mon, Sep 18, 2017 at 07:46:24PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > When we fail to open $GIT_DIR/info/alternates, we silently > > assume there are no alternates. This is the right thing to > > do for ENOENT, but not for other errors. > > > > A hard error is probably overkill here. If we fail to read > > an alternates file then either we'll complete our operation > > anyway, or we'll fail to find some needed object. Either > > way, a warning is good idea. And we already have a helper > > function to handle this pattern; let's just call > > warn_on_fopen_error(). > > I think I prefer a hard error. What kind of cases are you imagining > where it would be better to warn? > > E.g. for EIO, erroring out so that the user can try again seems better > than hoping that the application will be able to cope with the more > subtle error that comes from discovering some objects are missing. > > For EACCES, I can see how it makes sense to warn and move on, but no > other errors like that are occuring to me. > > Thoughts? Yes, EACCES is exactly the example that came to mind. But most importantly, we don't know at this point whether the alternate is even relevant to the current operation. So calling die() here would make some cases fail that would otherwise succeed. That's usually not a big deal, but we've had cases where being lazy about dying helps people use git itself to help repair the case. Admittedly most of those chicken-and-egg problems are centered around git-config (e.g., using "git config --edit" to repair broken config), so it's perhaps less important here. But it seems like a reasonable principle to follow in general. If there's a compelling reason to die hard, I'd reconsider it. But I can't really think of one (if we were _writing_ a new alternates file and encountered a read error of the old data we're copying, then I think it would be very important to bail before claiming a successful write. But that's not what's happening here). -Peff
Re: [PATCH 1/2] read_info_alternates: read contents into strbuf
On Mon, Sep 18, 2017 at 07:42:53PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > Reported-by: Michael Haggerty> > Signed-off-by: Jeff King > > --- > > sha1_file.c | 29 + > > 1 file changed, 9 insertions(+), 20 deletions(-) > > Thanks for tracking it down. To be fair, Michael did most of the work in identifying and bisecting the bug. He even wrote a very similar patch in parallel; I just swooped in at the end. > > path = xstrfmt("%s/info/alternates", relative_base); > > - fd = git_open(path); > > - free(path); > > - if (fd < 0) > > - return; > > - if (fstat(fd, ) || (st.st_size == 0)) { > > - close(fd); > > + if (strbuf_read_file(, path, 1024) < 0) { > > + free(path); > > return; > > strbuf_read_file is careful to release buf on failure, so this doesn't > leak (but it's a bit subtle). Yep. I didn't think it was worth calling out with a comment since the "don't allocate on failure" pattern is common to most of the strbuf functions. > What happened to the !st.st_size case? Is it eliminated for > simplicity? Good question, and the answer is yes. Obviously we can bail early on an empty file, but I don't think there's any reason to complicate the code with it (the original seems to come from d5a63b9983 (Alternate object pool mechanism updates., 2005-08-14), where it avoided a corner case that has long since been deleted. -Peff
Re: [PATCH 0/2] fix read past end of array in alternates files
On Mon, Sep 18, 2017 at 07:36:03PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > This series fixes a regression in v2.11.1 where we might read past the > > end of an mmap'd buffer. It was introduced in cf3c635210, > > The above information is super helpful. Can it go in one of the commit > messages? Er, didn't I? > > base the patch on there, for a few reasons: > > > > 1. There's a trivial conflict when merging up (because of > > git_open_noatime() becoming just git_open() in the inerim). > > > > 2. The reproduction advice relies on our SANITIZE Makefile knob, which > > didn't exist back then. > > > > 3. The second patch does not apply there because we don't have > > warn_on_fopen_errors(). Though admittedly it could be applied > > separately after merging up; it's just a clean-up on top. > > Even this part could go in a commit message, but it's fine for it not > to. IMHO this kind of meta information doesn't belong in the commit message. It's useful to the maintainer to know where to apply the patch, but I don't think it helps somebody who is reading "git log" output. -Peff
Re: [PATCH v2] Improve performance of git status --ignored
Jameson Millerwrites: > Improve the performance of the directory listing logic when it wants to list > non-empty ignored directories. In order to show non-empty ignored directories, > the existing logic will recursively iterate through all contents of an ignored > directory. This change introduces the optimization to stop iterating through > the contents once it finds the first file. Wow, such an obviously correct optimization. Very nicely explained, too. > This can have a significant > improvement in 'git status --ignored' performance in repositories with a large > number of files in ignored directories. > > For an example of the performance difference on an example repository with > 196,000 files in 400 ignored directories: > > | Command| Time (s) | > | -- | - | > | git status | 1.2 | > | git status --ignored (old) | 3.9 | > | git status --ignored (new) | 1.4 | > > Signed-off-by: Jameson Miller > --- I wish all the contributions I have to accept are as nicely done as this one ;-) Thanks. > dir.c | 47 +-- > 1 file changed, 41 insertions(+), 6 deletions(-) > > diff --git a/dir.c b/dir.c > index 1c55dc3..1d17b80 100644 > --- a/dir.c > +++ b/dir.c > @@ -49,7 +49,7 @@ struct cached_dir { > static enum path_treatment read_directory_recursive(struct dir_struct *dir, > struct index_state *istate, const char *path, int len, > struct untracked_cache_dir *untracked, > - int check_only, const struct pathspec *pathspec); > + int check_only, int stop_at_first_file, const struct pathspec > *pathspec); We might want to make check_only and stop_at_first_file into a single "unsigned flags" used as a collection of bits, but we can wait until we start feeling the urge to add the third boolean parameter to this function (at which point I'd probably demand a preliminary clean-up to merge these two into a single flags word before adding the third one as a new bit in that word). > @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct > dir_struct *dir, > > untracked = lookup_untracked(dir->untracked, untracked, >dirname + baselen, len - baselen); > + > + /* > + * If this is an excluded directory, then we only need to check if > + * the directory contains any files. > + */ > return read_directory_recursive(dir, istate, dirname, len, > - untracked, 1, pathspec); > + untracked, 1, exclude, pathspec); Nicely explained in the in-code comment. I'd assume that you want your microsoft e-mail address used on the signed-off-by line appear as the author, so I'll tweak this a bit to make it so (otherwise, your 8...@gmail.com would become the author). Thanks.
Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository
Jonathan Niederwrites: >> test_expect_success 'showing the superproject correctly' ' > > With the two tweaks mentioned above, > Reviewed-by: Jonathan Nieder I agree with the fixes to the test titles suggested, so I'll queue the patch with the fixes squashed in. Hearing "yeah, the titles were copy-pasted without adjusting, thanks for fixing, Jonathan!" sent by Øystein would be super nice. Thanks, both.
Re: [PATCH] t/README: fix typo and grammatically improve a sentence
Kaartic Sivaraamwrites: > Signed-off-by: Kaartic Sivaraam > --- > t/README | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/t/README b/t/README > index 2f95860369751..4b079e4494d93 100644 > --- a/t/README > +++ b/t/README > @@ -265,12 +265,12 @@ or: > > $ sh ./t9200-git-cvsexport-commit.sh --run='-3 21' > > -As noted above, the test set is built going though items left to > -right, so this: > +As noted above, the test set is built by going through the items > +from left to right, so this: > > $ sh ./t9200-git-cvsexport-commit.sh --run='1-4 !3' > > -will run tests 1, 2, and 4. Items that comes later have higher > +will run tests 1, 2, and 4. Items that come later have higher > precedence. It means that this: > > $ sh ./t9200-git-cvsexport-commit.sh --run='!3 1-4' > > -- > https://github.com/git/git/pull/404 Both of these look to me obvious improvements. I'll queue them unless other people object. Thanks.
Re: [RFC PATCH 0/2] Add named reference to latest push cert
Junio C Hamanowrites: > But the point is that we do not want such an overhead in core, as > all of that is a useless waste of the cycle for a site that wants to > store the push certificate away outside of the repository itself. By this I do not mean that cert blobs shouldn't become part of the in-repository record in _all_ installations that receive signed push certificates. An easy to reuse example shipped together with our source would be a good thing, and an easy to enable sample hook may even be better. It's just that I do not want to have any "solution" in the core part that everybody that wants to accept push certs must run, if the "solution" is not something we can endorse as the best current practice. And I do not yet see how anything along the lines of the patch that started this thread, or its extention by wrapping them with commit chain, would become that "best current practice". A sample hook, on the other hand, is simpler for people to experiment and we might even come up with an unversally useful best current prctice out of such an experiment, though.
Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted
Ian Campbellwrites: > This can be useful e.g. in `filter-branch` when rewritting tags produced by > older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel > source tree: > > $ git cat-file tag v2.6.12-rc2 > object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 > type commit > tag v2.6.12-rc2 > > Linux v2.6.12-rc2 release > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.2.4 (GNU/Linux) > > iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc > wznDbFU45A54dZC8RZ5JxyE= > =ESRP > -END PGP SIGNATURE- > > $ git cat-file tag v2.6.12-rc2 | git mktag > error: char76: could not find "tagger " > fatal: invalid tag signature file > $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger > 9e734775f7c22d2f89943ad6c745571f1930105f > > To that end, pass the new option to `mktag` in `filter-branch`. Hmph. I cannot shake this nagging feeling that this is probably a solution that is overly narrow to a single problem that won't scale into the future. As there is no guarantee that "missing-tagger" will stay to be the only kind of broken tag objects we'd produce in one version, later we notice our mistake and then forbid in another version, with the approach to add '--allow-missing-tagger' optoin would imply that we'd end up adding more and more such options, and filter-branch will need to use all these '--allow-this-breakage' options we would ever add. Even though I fully agree with the problem you are trying to solve (i.e. we want to be able to replay an old history without our tool rejecting the data we have), it was my first reaction when I read this patch. IOW, my first reaction was "perhaps a single option '--allow-broken' to cover the currently known and any future shape of malformat over tag data is more appropriate". But then, if we look at the body of cmd_mktag(), it looks like this: int cmd_mktag(...) { read input into strbuf buf; call verify_tag on buf to sanity check; call write_sha1_file() the contents of buf as a tag; report the object name; } If we drop the "verification" step from the above, that essentially becomes an equivaent to "hash-object -t tag -w --stdin". So I now have to wonder if it may be sufficient to use "hash-object" in filter-branch, without doing this "allow malformed data that we would not permit if the tag were being created today, only to help replaying an old, already broken data" change to "git mktag". Is there something that makes "hash-object" insufficient (like it still does some extra checks we would want to disable and cannot work as a replacement for your "--allow-missing-tagger")? Thanks. > Signed-off-by: Ian Campbell > --- > Documentation/git-mktag.txt | 9 +++- > builtin/mktag.c | 100 > +--- > git-filter-branch.sh| 2 +- > t/t3800-mktag.sh| 33 ++- > 4 files changed, 98 insertions(+), 46 deletions(-) > > diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt > index fa6a75612..c720c7419 100644 > --- a/Documentation/git-mktag.txt > +++ b/Documentation/git-mktag.txt > @@ -9,7 +9,7 @@ git-mktag - Creates a tag object > SYNOPSIS > > [verse] > -'git mktag' > +'git mktag' [--allow-missing-tagger] > > DESCRIPTION > --- > @@ -34,6 +34,13 @@ exists, is separated by a blank line from the header. The > message part may contain a signature that Git itself doesn't > care about, but that can be verified with gpg. > > +OPTIONS > +--- > +--allow-missing-tagger:: > + Allow the `tagger` line in the header to be omitted. This is > + rarely desirable but may be useful in recreating tags created > + by older Git. > + > GIT > --- > Part of the linkgit:git[1] suite > diff --git a/builtin/mktag.c b/builtin/mktag.c > index 031b750f0..0f5dae8d5 100644 > --- a/builtin/mktag.c > +++ b/builtin/mktag.c > @@ -1,4 +1,5 @@ > #include "builtin.h" > +#include "parse-options.h" > #include "tag.h" > > /* > @@ -15,6 +16,8 @@ > * the shortest possible tagger-line. > */ > > +static int allow_missing_tagger; > + > /* > * We refuse to tag something we can't verify. Just because. > */ > @@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size) > unsigned char sha1[20]; > const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb; > size_t len; > + const unsigned long min_size = allow_missing_tagger ? 71 : 84; > > - if (size < 84) > + if (size < min_size) > return error("wanna fool me ? you obviously got the size wrong > !"); > > buffer[size] = 0; > @@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size) > /* Verify the tagger line
Re: [PATCH 2/2] t9010-*.sh: skip all tests if the PIPE prereq is missing
Ramsay Jones wrote: > Every test in this file, except one, is marked with the PIPE prereq. > However, that lone test ('set up svn repo'), only performs some setup > work and checks whether the following test should be executed (by > setting an additional SVNREPO prerequisite). Since the following test > also requires the PIPE prerequisite, performing the setup test, when the > PIPE preequisite is missing, is simply wasted effort. Use the skip-all > test facility to skip all tests when the PIPE prerequisite is missing. > > Signed-off-by: Ramsay Jones> --- > t/t9010-svn-fe.sh | 55 > --- > 1 file changed, 28 insertions(+), 27 deletions(-) It was always the intention that this test script eventually be able to run on Windows, but it was not meant to be. It would need to use a socket or something for backflow to work on Windows. Reviewed-by: Jonathan Nieder
Re: [PATCH 1/2] test-lib: use more compact expression in PIPE prerequisite
Ramsay Jones wrote: > Signed-off-by: Ramsay Jones> --- > t/test-lib.sh | 10 ++ > 1 file changed, 2 insertions(+), 8 deletions(-) Reviewed-by: Jonathan Nieder
Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors
Jeff King wrote: > When we fail to open $GIT_DIR/info/alternates, we silently > assume there are no alternates. This is the right thing to > do for ENOENT, but not for other errors. > > A hard error is probably overkill here. If we fail to read > an alternates file then either we'll complete our operation > anyway, or we'll fail to find some needed object. Either > way, a warning is good idea. And we already have a helper > function to handle this pattern; let's just call > warn_on_fopen_error(). I think I prefer a hard error. What kind of cases are you imagining where it would be better to warn? E.g. for EIO, erroring out so that the user can try again seems better than hoping that the application will be able to cope with the more subtle error that comes from discovering some objects are missing. For EACCES, I can see how it makes sense to warn and move on, but no other errors like that are occuring to me. Thoughts? Thanks, Jonathan > Note that technically the errno from strbuf_read_file() > might be from a read() error, not open(). But since read() > would never return ENOENT or ENOTDIR, and since it produces > a generic "unable to access" error, it's suitable for > handling errors from either.
Re: [PATCH 1/2] read_info_alternates: read contents into strbuf
Jeff King wrote: > Reported-by: Michael Haggerty> Signed-off-by: Jeff King > --- > sha1_file.c | 29 + > 1 file changed, 9 insertions(+), 20 deletions(-) Thanks for tracking it down. Reviewed-by: Jonathan Nieder [...] > +++ b/sha1_file.c [...] > @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int > len, int sep, > > static void read_info_alternates(const char * relative_base, int depth) > { > - char *map; > - size_t mapsz; > - struct stat st; > char *path; > - int fd; > + struct strbuf buf = STRBUF_INIT; > > path = xstrfmt("%s/info/alternates", relative_base); > - fd = git_open(path); > - free(path); > - if (fd < 0) > - return; > - if (fstat(fd, ) || (st.st_size == 0)) { > - close(fd); > + if (strbuf_read_file(, path, 1024) < 0) { > + free(path); > return; strbuf_read_file is careful to release buf on failure, so this doesn't leak (but it's a bit subtle). What happened to the !st.st_size case? Is it eliminated for simplicity? The rest looks good. Thanks, Jonathan
Re: [PATCH 0/2] fix read past end of array in alternates files
Jeff King wrote: > This series fixes a regression in v2.11.1 where we might read past the > end of an mmap'd buffer. It was introduced in cf3c635210, The above information is super helpful. Can it go in one of the commit messages? > but I didn't > base the patch on there, for a few reasons: > > 1. There's a trivial conflict when merging up (because of > git_open_noatime() becoming just git_open() in the inerim). > > 2. The reproduction advice relies on our SANITIZE Makefile knob, which > didn't exist back then. > > 3. The second patch does not apply there because we don't have > warn_on_fopen_errors(). Though admittedly it could be applied > separately after merging up; it's just a clean-up on top. Even this part could go in a commit message, but it's fine for it not to. Thanks, Jonathan
Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository
Hi, Øystein Walle wrote: > Running `git fetch --unshallow` on a repo that is not in fact shallow > produces a fatal error message. Hm, can you say more about the context? From a certain point of view, it might make sense for that command to succeed instead: if the repo is already unshallow, then why should't "fetch --unshallow" complain instead of declaring victory? > Add a helper to rev-parse that scripters > can use to determine whether a repo is shallow or not. > > Signed-off-by: Øystein Walle> --- > Documentation/git-rev-parse.txt | 3 +++ > builtin/rev-parse.c | 5 + > t/t1500-rev-parse.sh| 15 +++ > 3 files changed, 23 insertions(+) Regardless, this new rev-parse --is-shallow helper looks like a good feature. [...] > --- a/builtin/rev-parse.c > +++ b/builtin/rev-parse.c > @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const > char *prefix) > : "false"); > continue; > } > + if (!strcmp(arg, "--is-shallow-repository")) { > + printf("%s\n", is_repository_shallow() ? "true" > + : "false"); > + continue; > + } The implementation is straightforward and correct. [...] > --- a/t/t1500-rev-parse.sh > +++ b/t/t1500-rev-parse.sh Thanks for writing tests. \o/ > @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' ' > test_cmp expect actual > ' > > +test_expect_success 'git-path shallow repository' ' What does git-path mean here? I wonder if it's a copy/paste error. Did you mean something like test_expect_success 'rev-parse --is-shallow-repository in shallow repo' ' ? > + test_commit test_commit && > + echo true >expect && > + git clone --depth 1 --no-local . shallow && > + test_when_finished "rm -rf shallow" && > + git -C shallow rev-parse --is-shallow-repository >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'git-path notshallow repository' ' Likewise: should this be test_expect_success 'rev-parse --is-shallow-repository in non-shallow repo' ' ? > + echo false >expect && > + git rev-parse --is-shallow-repository >actual && > + test_cmp expect actual > +' > + > test_expect_success 'showing the superproject correctly' ' With the two tweaks mentioned above, Reviewed-by: Jonathan Nieder Thanks.
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Johannes Schindelinwrites: >> Do you have a concrete suggestion to make these individual entries more >> helpful for people who may want go back to the original thread in doing >> so? In-reply-to: or References: fields of the "What's cooking" report >> would not help. I often have the message IDs that made/helped me make >> these individual comments in the description; they alone would not react >> to mouse clicks, though. > > Oh gawd, not even more stuff piled onto the mail format. Please stop. > ... > It probably tries to serve too many purposes at the same time, and thereby > none. Well, this was started as my attempt to give a public service that shows a summary of what is happening in the entire integration tree, as there was nothing like that before (and going to github.com and looking at 'pu' branch would not give you an easy overview). As many people contribute many topics to the project, complaining that it talks about too many topics would not get you anywhere. If you find "What's cooking" report not serving your needs, and if no one finds it not serving his or her needs, then I can stop sending these out, of course, but I am not getting the impression that we are at that point, at least not yet.
Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)
Lars Schneiderwrites: > SZEDER noticing a bug in this series that I was about to fix: > https://public-inbox.org/git/3b175d35-5b1c-43cd-a7e9-85693335b...@gmail.com/ > > I assume at this point a new patch is better than a re-roll, right? Thanks, yes indeed.
Re: [RFC PATCH 0/2] Add named reference to latest push cert
Santiago Torreswrites: > - *if there is a hook* the blob is computed, but it is up to the > hook itself to store it *somewhere*. This makes me feel like it's > somewhat of a useless waste of computation if the hook is not > meant to handle it anyway(which is just a post-receive hook). I > find it rather weird that --signed is a builtin flag, and is > handled on the server side only partially (just my two cents). The way it was envisioned to be used is that the repository meant to be protected by collected push certs may not be trusted as the permanent store for push certs by all hosting sites. The hook may be told the name of a blob to read its contents and is expected to store it away to somewhere else. The only reason why we use blob is because creating a blob in respose to pushes being executed in parallel will result in different blobs unless there is hash collision. Instead of us having to come up with and use a different mechanism to create a unique temporary filename and feed that to hook, reusing blob as such was the simplest. > I understand the concurrency concerns, so I agree with stefan's > solution, although I don't know how big of a deal it would be, if git > already supports --atomic pushes (admittedly, I haven't checked if there > are any guards in place for someone who pushes millions of refs > atomically). It'd be completely understandable to have a setting to > disable hadnling of --signed pushes and, ideally, a way to warn the user > of this feature being disabled if --signed is sent. I do not think atomic helps at all, when one atomic push updates branch A while another atomic push updates branch B. They can still go in parallel, and their certificates must both be stored. You can somehow serialize them and create a single strand of pearls to advance a single ref, or you can let both to fork two histories to store the push certs from these two pushes and have somebody create a merge commit to join the history. But the point is that we do not want such an overhead in core, as all of that is a useless waste of the cycle for a site that wants to store the push certificate away outside of the repository itself.
Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()
Derrick Stoleewrites: >> But I do not think we want this "clever" optimization that involves >> 'n' in the first place. + while (count++ < 10) { >>> + for (i = 0; i < n; i++) >>> + ((unsigned int*)oid.hash)[i] = hash_base; >> >> Does it make sense to assume that uint is always 4-byte (so this >> code won't work if it is 8-byte on your platform) and doing this is >> faster than using platform-optimized memcpy()? > > I'm not sure what you mean by using memcpy to improve this, because > it would require calling memcpy in the inner loop, such as > > for (i = 0; i < n; i++) > memcpy(oid.hash + i * sizeof(unsigned), _base, > sizeof(unsigned)); Sorry, I left it without saying as I thought it was obvious, but what I meant was to use a whole "struct oid", not just a single unsigned (repeated), as the hash that is tested. If you have an array of object names you use in the test, then for (count = 0; count < limit; count++) { hashcpy(, [count]); ... do the probing ... } > First, this doesn't just measure the time it takes to determine non- > existence, Sorry, my phrasing was indeed misleading. I know the time we spend to see if we have or do not have the object is the largest cycle spender in these codepaths (and even if it were, I do not think that is what you are trying to optimize in these patches anyway). But if I recall correctly, the way we come up with the unique abbreviation for an object that exists and an object that does not exist are different? And because most of the time (think: "git log -p" output) we would be finding abbreviation for objects that we do have, benchmarking and tweaking the code that comes up with an object that does not exist is not optimizing for the right case. Back when I wrote that initial response, I didn't check how different the code was between the two cases, but now I did. It seems that in both cases we start from the shortest-allowed length and then extend the same way, and the only difference between these two cases is that we return immediately when our candidate name is long enough not to match any existing object when the full name refers to an object we do not have, while we return only when disambiguity is resolved. I _think_ these amount to the same computation (i.e. when an object with the full name we have exists, the amount of computation we need to come up with its unique abbreviation is the same as the computation we need to find the unique abbreviation for the same name in another repository that has identical set of objects, minus that single object), so from that point of view, throwing random hashes, most of which would not name any existing object, and measuring how much time it takes to run get_short_oid() to compute the minimum length of the unique prefix should be sufficient. Thanks.
Re: [PATCH] describe: teach --match to handle branches and remotes
On Mon, Sep 18, 2017 at 4:52 PM, Junio C Hamanowrote: > Max Kirillov writes: > >> --match :: >> Only consider tags matching the given `glob(7)` pattern, >> - excluding the "refs/tags/" prefix. This can be used to avoid >> - leaking private tags from the repository. If given multiple times, a >> - list of patterns will be accumulated, and tags matching any of the >> - patterns will be considered. Use `--no-match` to clear and reset the >> - list of patterns. >> + excluding the "refs/tags/" prefix. If used with `--all`, it also >> + considers local branches and remote-tracking references matching the >> + pattern, excluding respectively "refs/heads/" and "refs/remotes/" >> + prefix; references of other types are never considered. If given >> + multiple times, a list of patterns will be accumulated, and tags >> + matching any of the patterns will be considered. Use `--no-match` to >> + clear and reset the list of patterns. >> >> --exclude :: >> Do not consider tags matching the given `glob(7)` pattern, excluding >> - the "refs/tags/" prefix. This can be used to narrow the tag space and >> - find only tags matching some meaningful criteria. If given multiple >> - times, a list of patterns will be accumulated and tags matching any >> - of the patterns will be excluded. When combined with --match a tag will >> - be considered when it matches at least one --match pattern and does not >> + the "refs/tags/" prefix. If used with `--all`, it also does not >> consider >> + local branches and remote-tracking references matching the pattern, >> + excluding respectively "refs/heads/" and "refs/remotes/" prefix; >> + references of other types are never considered. If given multiple >> times, >> + a list of patterns will be accumulated and tags matching any of the >> + patterns will be excluded. When combined with --match a tag will be >> + considered when it matches at least one --match pattern and does not >> match any of the --exclude patterns. Use `--no-exclude` to clear and >> reset the list of patterns. > > OK, I find this written clearly enough. > >> diff --git a/builtin/describe.c b/builtin/describe.c >> index 94ff2fba0b..2a2e998063 100644 >> --- a/builtin/describe.c >> +++ b/builtin/describe.c >> @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path, >> } >> } >> >> +/* Drops prefix. Returns NULL if the path is not expected with current >> settings. */ >> +static const char *get_path_to_match(int is_tag, int all, const char *path) >> +{ >> + if (is_tag) >> + return path + 10; > > This is a faithful conversion of the existing code that wants to > behave the same as original, but a bit more on this later. > >> + else if (all) { >> + if (starts_with(path, "refs/heads/")) >> + return path + 11; /* "refs/heads/..." */ >> + else if (starts_with(path, "refs/remotes/")) >> + return path + 13; /* "refs/remotes/..." */ >> + else >> + return 0; > > I think you can use skip_prefix() to avoid counting the length of > the prefix yourself, i.e. > > else if all { > const char *body; > > if (skip_prefix(path, "refs/heads/", )) > return body; > else if (skip_prefix(path, "refs/remotes/", )) > ... > } > > Whether you do the above or not, the last one that returns 0 should > return NULL (to the language it is the same thing, but to humans, we > write NULL when it is the null pointer, not the number 0). > >> + } else >> + return NULL; >> +} > > Perhaps the whole thing may want to be a bit more simplified, like: > > static const *skip_ref_prefix(const char *path, int all) > { > const char *prefix[] = { > "refs/tags/", "refs/heads/", "refs/remotes/" > }; > const char *body; > int cnt; > int bound = all ? ARRAY_SIZE(prefix) : 1; > I found the implicit use of "bound = 1" means "we only care about tags" to be a bit weird here. I guess it's not really that big a deal overall, and this is definitely cleaner than the original implementation. > for (cnt = 0; cnt < bound; cnt++) > if (skip_prefix(path, prefix[cnt], ); > return body; > return NULL; > } > > The hardcoded +10 for "is_tag" case assumes that anything other than > "refs/tags/something" would ever be used to call into this function > when is_tag is true, and that may well be true in the current code > and have been so ever since the original code was written, but it > still smells like an invitation for future bugs. > > I dunno. > >> + >> static
Re: [PATCH 1/2] read_info_alternates: read contents into strbuf
Jeff Kingwrites: > We could also just make a NUL-terminated copy of the input > bytes and operate on that. But since all but one caller > already is passing a string, instead let's just fix that > caller to provide NUL-terminated input in the first place, > by swapping out mmap for strbuf_read_file(). > ... > Let's also drop the "len" parameter entirely from > link_alt_odb_entries(), since it's completely ignored. That > will avoid any new callers re-introducing a similar bug. Both design decisions make perfect sense to me. > sha1_file.c | 29 + > 1 file changed, 9 insertions(+), 20 deletions(-) And diffstat also agrees that it is a good change ;-)
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
> I am hoping that this last one is not allowed and we can use the > "same condition is checked every time we loop" version that hides > the uglyness inside the macro. By which you are referring to Jonathans solution posted. Maybe we can combine the two solutions (checking for thelist to not be NULL once, by Jonathan) and using an outer structure (SZEDERs solution) by replacing the condition by a for loop, roughly (untested): #define for_each_string_list_item(item,list) \ - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) +for (; list; list = NULL) +for (item = (list)->items; item < (list)->items + (list)->nr; ++item) as that would not mingle with any dangling else clause. It is also just one statement, such that if (bla) for_each_string_list_item { baz(item); } else foo; still works. Are there downsides to this combined approach?
Re: [PATCH] describe: teach --match to handle branches and remotes
Max Kirillovwrites: > --match :: > Only consider tags matching the given `glob(7)` pattern, > - excluding the "refs/tags/" prefix. This can be used to avoid > - leaking private tags from the repository. If given multiple times, a > - list of patterns will be accumulated, and tags matching any of the > - patterns will be considered. Use `--no-match` to clear and reset the > - list of patterns. > + excluding the "refs/tags/" prefix. If used with `--all`, it also > + considers local branches and remote-tracking references matching the > + pattern, excluding respectively "refs/heads/" and "refs/remotes/" > + prefix; references of other types are never considered. If given > + multiple times, a list of patterns will be accumulated, and tags > + matching any of the patterns will be considered. Use `--no-match` to > + clear and reset the list of patterns. > > --exclude :: > Do not consider tags matching the given `glob(7)` pattern, excluding > - the "refs/tags/" prefix. This can be used to narrow the tag space and > - find only tags matching some meaningful criteria. If given multiple > - times, a list of patterns will be accumulated and tags matching any > - of the patterns will be excluded. When combined with --match a tag will > - be considered when it matches at least one --match pattern and does not > + the "refs/tags/" prefix. If used with `--all`, it also does not consider > + local branches and remote-tracking references matching the pattern, > + excluding respectively "refs/heads/" and "refs/remotes/" prefix; > + references of other types are never considered. If given multiple times, > + a list of patterns will be accumulated and tags matching any of the > + patterns will be excluded. When combined with --match a tag will be > + considered when it matches at least one --match pattern and does not > match any of the --exclude patterns. Use `--no-exclude` to clear and > reset the list of patterns. OK, I find this written clearly enough. > diff --git a/builtin/describe.c b/builtin/describe.c > index 94ff2fba0b..2a2e998063 100644 > --- a/builtin/describe.c > +++ b/builtin/describe.c > @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path, > } > } > > +/* Drops prefix. Returns NULL if the path is not expected with current > settings. */ > +static const char *get_path_to_match(int is_tag, int all, const char *path) > +{ > + if (is_tag) > + return path + 10; This is a faithful conversion of the existing code that wants to behave the same as original, but a bit more on this later. > + else if (all) { > + if (starts_with(path, "refs/heads/")) > + return path + 11; /* "refs/heads/..." */ > + else if (starts_with(path, "refs/remotes/")) > + return path + 13; /* "refs/remotes/..." */ > + else > + return 0; I think you can use skip_prefix() to avoid counting the length of the prefix yourself, i.e. else if all { const char *body; if (skip_prefix(path, "refs/heads/", )) return body; else if (skip_prefix(path, "refs/remotes/", )) ... } Whether you do the above or not, the last one that returns 0 should return NULL (to the language it is the same thing, but to humans, we write NULL when it is the null pointer, not the number 0). > + } else > + return NULL; > +} Perhaps the whole thing may want to be a bit more simplified, like: static const *skip_ref_prefix(const char *path, int all) { const char *prefix[] = { "refs/tags/", "refs/heads/", "refs/remotes/" }; const char *body; int cnt; int bound = all ? ARRAY_SIZE(prefix) : 1; for (cnt = 0; cnt < bound; cnt++) if (skip_prefix(path, prefix[cnt], ); return body; return NULL; } The hardcoded +10 for "is_tag" case assumes that anything other than "refs/tags/something" would ever be used to call into this function when is_tag is true, and that may well be true in the current code and have been so ever since the original code was written, but it still smells like an invitation for future bugs. I dunno. > + > static int get_name(const char *path, const struct object_id *oid, int flag, > void *cb_data) > { > int is_tag = starts_with(path, "refs/tags/"); > @@ -140,12 +156,13 @@ static int get_name(const char *path, const struct > object_id *oid, int flag, voi >*/ > if (exclude_patterns.nr) { > struct string_list_item *item; > + const char *path_to_match = get_path_to_match(is_tag, all, > path);
Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer
Hi, phionah bugosi wrote: > Just to reecho a previous change requested before in one of the mail > threads, we currently have two global variables declared: > > struct mru packed_git_mru_storage; > struct mru *packed_git_mru = _git_mru_storage; > > We normally use pointers in C to point or refer to the same location > or space in memory from multiple places. That means in situations > where we will have the pointer be assigned to in many places in our > code. But in this case, we do not have any other logic refering or > assigning to the pointer packed_git_mru. In simple words we actually > dont need a pointer in this case. > > Therefore this patch makes packed_git_mru global a value instead of > a pointer and makes use of list.h > > Signed-off-by: phionah bugosi> --- > builtin/pack-objects.c | 5 +++-- > cache.h| 7 --- > list.h | 6 ++ > packfile.c | 12 ++-- > 4 files changed, 15 insertions(+), 15 deletions(-) *puzzled* This appears to already be in "pu", with a different author. Did you independently make the same change? Or are you asking for a progress report on that change, and just phrasing it in a confusing way? Confused, Jonathan
Re: RFC v3: Another proposed hash function transition plan
Hi, Gilles Van Assche wrote: > Hi Johannes, >> SHA-256 got much more cryptanalysis than SHA3-256 […]. > > I do not think this is true. Keccak/SHA-3 actually got (and is still > getting) a lot of cryptanalysis, with papers published at renowned > crypto conferences [1]. > > Keccak/SHA-3 is recognized to have a significant safety margin. E.g., > one can cut the number of rounds in half (as in Keyak or KangarooTwelve) > and still get a very strong function. I don't think we could say the > same for SHA-256 or SHA-512… I just wanted to thank you for paying attention to this conversation and weighing in. Most of the regulars in the git project are not crypto experts. This kind of extra information (and e.g. [2]) is very useful to us. Thanks, Jonathan > Kind regards, > Gilles, for the Keccak team > > [1] https://keccak.team/third_party.html [2] https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/
Re: RFC v3: Another proposed hash function transition plan
Hi Gilles, On Mon, 18 Sep 2017, Gilles Van Assche wrote: > > SHA-256 got much more cryptanalysis than SHA3-256 […]. > > I do not think this is true. Please read what I said again: SHA-256 got much more cryptanalysis than SHA3-256. I never said that SHA3-256 got little cryptanalysis. Personally, I think that SHA3-256 got a ton more cryptanalysis than SHA-1, and that SHA-256 *still* got more cryptanalysis. But my opinion does not count, really. However, the two experts I pestered with questions over questions left me with that strong impression, and their opinion does count. > Keccak/SHA-3 actually got (and is still getting) a lot of cryptanalysis, > with papers published at renowned crypto conferences [1]. > > Keccak/SHA-3 is recognized to have a significant safety margin. E.g., > one can cut the number of rounds in half (as in Keyak or KangarooTwelve) > and still get a very strong function. I don't think we could say the > same for SHA-256 or SHA-512… Again, I do not want to criticize SHA3/Keccak. Personally, I have a lot of respect for Keccak. I also have a lot of respect for everybody who scrutinized the SHA2 family of algorithms. I also respect the fact that there are more implementations of SHA-256, and thanks to everybody seeming to demand SHA-256 checksums instead of SHA-1 or MD5 for downloads, bugs in those implementations are probably discovered relatively quickly, and I also cannot ignore the prospect of hardware support for SHA-256. In any case, having SHA3 as a fallback in case SHA-256 gets broken seems like a very good safety net to me. Ciao, Johannes
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
>> instead compared to the proposed configuration above. >> (Even better yet, then people could play around with "v1 only" >> and see how it falls apart on old servers) > > Except we can't start with an explicit whitelist because we must > fallback to v0 if v1 isn't supported otherwise we would break people. > > That is unless we have the semantics of: If not configured v0 will be > used, otherwise only use the configured protocol versions. > Good point. Thinking about this, how about: If not configured, we do as we want. (i.e. Git has full control over it's decision making process, which for now is "favor v0 over v1 as we are experimenting with v1". This strategy may change in the future to "prefer highest version number that both client and server can speak".) If it is configured, "use highest configured number from the given set". ?
Re: [PATCH v2] add test for bug in git-mv for recursive submodules
>> Took a little while but here is a more clean patch creating individual >> submodules for the nesting. >> >> Cheers Heiko Thanks for writing this test! > > Thanks. Stefan, does this look good to you now? Yes, though there are nits below. > It is not quite clear which step is expected to fail with the > current code by reading the test or the proposed log message. Does > "mv" refuse to work and we do not get to run "status", or does > "status" report a failure, or do we fail well before that? git-mv failing seems like a new possibility without incurring another process spawn with the new repository object. (Though then we could also just fix the recursed submodule) > The log message that only says "This does not work when ..." is not > helpful in figuring it out, either. Something like "This does not > work and fails to update the paths for its configurations" or > whatever that describes "what actually happens" (in contrast to > "what ought to happen", which you described clearly) should be > there. > > Description on how you happened to have discovered the issue feels a > lot less relevant compared to that, and it is totally useless if it > is unclear what the issue is in the first place. > >> t/t7001-mv.sh | 25 + >> 1 file changed, 25 insertions(+) >> >> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh >> index e365d1ff77..cbc5fb37fe 100755 >> --- a/t/t7001-mv.sh >> +++ b/t/t7001-mv.sh >> @@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested >> directories' ' >> test_cmp actual expect >> ' >> >> +test_expect_failure 'moving nested submodules' ' >> + git commit -am "cleanup commit" && >> + mkdir sub_nested_nested && >> + (cd sub_nested_nested && We seem to have different styles for nested shell. I prefer outside command && ( first nested command here && ... as that aligns indentation to the nesting level. I have seen the style you use a lot in the test suite, and we do not have a guideline in Documentation/CodingGuidelines, so I do not complain too loudly. ;) >> + touch nested_level2 && >> + git init && >> + git add . && >> + git commit -m "nested level 2" >> + ) && >> + mkdir sub_nested && >> + (cd sub_nested && >> + touch nested_level1 && >> + git init && >> + git add . && >> + git commit -m "nested level 1" >> + git submodule add ../sub_nested_nested && >> + git commit -m "add nested level 2" >> + ) && >> + git submodule add ./sub_nested nested_move && >> + git commit -m "add nested_move" && >> + git submodule update --init --recursive && So far a nice setup! >> + git mv nested_move sub_nested_moved && This is the offending command that produces the bug, as it will break most subsequent commands, such as >> + git status git-status is one of the basic commands. Without status to function, I think it is hard to recover your repo without a lot of in-depth knowledge of Git (submodules). I wonder if git-status should complain more gracefully and fallback to one of --ignore-submodules={dirty, all}, that actually still works. Maybe we could introduce a new default mode for this flag, that is "none-except-on-error", though this sounds as if we're fixing symptoms instead of the root cause.
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
On 09/18, Stefan Beller wrote: > >> From a users POV this may be frustrating as I would imagine that > >> people want to run > >> > >> git config --global protocol.version 2 > >> > >> to try it out and then realize that some of their hosts do not speak > >> 2, so they have to actually configure it per repo/remote. > > > > The point would be to be able to set this globally, not per-repo. Even > > if a repo doesn't speak v2 then it should be able to gracefully degrade > > to v1 without the user having to do anything. The reason why there is > > this escape hatch is if doing the protocol negotiation out of band > > causing issues with communicating with a server that it can be shut off. > > In the current situation it is easy to assume that if v1 (and not v0) > is configured, that the users intent is "to try out v1 and fallback > gracefully to v0". > > But this will change over time in the future! > > Eventually people will have the desire to say: > "Use version N+1, but never version N", because N has > performance or security issues; the user might not want > to bother to try N or even actively want to be affirmed that > Git will never use version N. > > In this future we need a mechanism, that either contains a > white list or black list of protocols. To keep it simple (I assume > there won't be many protocol versions), a single white list will do. > > However transitioning from the currently proposed "try the new > configured thing and fallback to whatever" to "this is the exact list > of options that Git will try for you" will be hard, as we may break people > if we do not unconditionally fall back to v0. > > That is why I propose to start with an explicit white list as then we > do not have to have a transition plan or otherwise work around the > issue. Also it doesn't hurt now to use > > git config --global protocol.version v1,v0 > > instead compared to the proposed configuration above. > (Even better yet, then people could play around with "v1 only" > and see how it falls apart on old servers) Except we can't start with an explicit whitelist because we must fallback to v0 if v1 isn't supported otherwise we would break people. That is unless we have the semantics of: If not configured v0 will be used, otherwise only use the configured protocol versions. -- Brandon Williams
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
>> From a users POV this may be frustrating as I would imagine that >> people want to run >> >> git config --global protocol.version 2 >> >> to try it out and then realize that some of their hosts do not speak >> 2, so they have to actually configure it per repo/remote. > > The point would be to be able to set this globally, not per-repo. Even > if a repo doesn't speak v2 then it should be able to gracefully degrade > to v1 without the user having to do anything. The reason why there is > this escape hatch is if doing the protocol negotiation out of band > causing issues with communicating with a server that it can be shut off. In the current situation it is easy to assume that if v1 (and not v0) is configured, that the users intent is "to try out v1 and fallback gracefully to v0". But this will change over time in the future! Eventually people will have the desire to say: "Use version N+1, but never version N", because N has performance or security issues; the user might not want to bother to try N or even actively want to be affirmed that Git will never use version N. In this future we need a mechanism, that either contains a white list or black list of protocols. To keep it simple (I assume there won't be many protocol versions), a single white list will do. However transitioning from the currently proposed "try the new configured thing and fallback to whatever" to "this is the exact list of options that Git will try for you" will be hard, as we may break people if we do not unconditionally fall back to v0. That is why I propose to start with an explicit white list as then we do not have to have a transition plan or otherwise work around the issue. Also it doesn't hurt now to use git config --global protocol.version v1,v0 instead compared to the proposed configuration above. (Even better yet, then people could play around with "v1 only" and see how it falls apart on old servers)
RE: [PATCH v6 12/12] fsmonitor: add a performance test
> -Original Message- > From: Johannes Schindelin [mailto:johannes.schinde...@gmx.de] > Sent: Monday, September 18, 2017 10:25 AM > To: Ben Peart> Cc: david.tur...@twosigma.com; ava...@gmail.com; > christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; > pclo...@gmail.com; p...@peff.net > Subject: Re: [PATCH v6 12/12] fsmonitor: add a performance test > > Hi Ben, > > sorry for not catching this earlier: > > On Fri, 15 Sep 2017, Ben Peart wrote: > > > [...] > > + > > +int cmd_dropcaches(void) > > +{ > > + HANDLE hProcess = GetCurrentProcess(); > > + HANDLE hToken; > > + int status; > > + > > + if (!OpenProcessToken(hProcess, TOKEN_QUERY | > TOKEN_ADJUST_PRIVILEGES, )) > > + return error("Can't open current process token"); > > + > > + if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1)) > > + return error("Can't get SeProfileSingleProcessPrivilege"); > > + > > + CloseHandle(hToken); > > + > > + HMODULE ntdll = LoadLibrary("ntdll.dll"); > Thanks Johannes, I'll fix that. > Git's source code still tries to abide by C90, and for simplicity's sake, this > extends to the Windows-specific part. Therefore, the `ntdll` variable needs to > be declared at the beginning of the function (I do agree that it makes for > better code to reduce the scope of variables, but C90 simply did not allow > variables to be declared in the middle of functions). > > I wanted to send a patch address this in the obvious way, but then I > encountered these lines: > > > + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) = > > + (DWORD(WINAPI *)(INT, PVOID, > ULONG))GetProcAddress(ntdll, "NtSetSystemInformation"); > > + if (!NtSetSystemInformation) > > + return error("Can't get function addresses, wrong Windows > > +version?"); > > It turns out that we have seen this plenty of times in Git for Windows' > fork, so much so that we came up with a nice helper to make this all a bit > more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and > INIT_PROC_ADDR helpers in compat/win32/lazyload.h. > > Maybe this would be the perfect excuse to integrate this patch into upstream > Git? This patch is pretty hefty already. How about you push this capability upstream and I take advantage of it in a later patch. :) This would be the patch (you can also cherry-pick it from > 25c4dc3a73352e72e995594cf1b4afa46e93d040 in > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub. > com%2Fdscho%2Fgit=02%7C01%7CBen.Peart%40microsoft.com%7C96 > 4027bdc1f34a033c1f08d4fea1056e%7C72f988bf86f141af91ab2d7cd011db47 > %7C1%7C0%7C636413414914282865=jyvu6G7myRY9UA1XxWx2tDZ% > 2BWsIWqLTRMT8WfzEGe5g%3D=0): > > -- snip -- > From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00 > 2001 > From: Johannes Schindelin > Date: Tue, 10 Jan 2017 23:14:20 +0100 > Subject: [PATCH] Win32: simplify loading of DLL functions > > Dynamic loading of DLL functions is duplicated in several places in Git for > Windows' source code. > > This patch adds a pair of macros to simplify the process: the > DECLARE_PROC_ADDR(, , , > ..) macro to be used at the beginning of a > code block, and the INIT_PROC_ADDR() macro to call before > using the declared function. The return value of the INIT_PROC_ADDR() call > has to be checked; If it is NULL, the function was not found in the specified > DLL. > > Example: > > DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, > LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); > > if (!INIT_PROC_ADDR(CreateHardLinkW)) > return error("Could not find CreateHardLinkW() function"; > > if (!CreateHardLinkW(source, target, NULL)) > return error("could not create hardlink from %S to %S", >source, target); > return 0; > > Signed-off-by: Karsten Blees > Signed-off-by: Johannes Schindelin > --- > compat/win32/lazyload.h | 44 > > 1 file changed, 44 insertions(+) > create mode 100644 compat/win32/lazyload.h > > diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file > mode 100644 index 000..91c10dad2fb > --- /dev/null > +++ b/compat/win32/lazyload.h > @@ -0,0 +1,44 @@ > +#ifndef LAZYLOAD_H > +#define LAZYLOAD_H > + > +/* simplify loading of DLL functions */ > + > +struct proc_addr { > + const char *const dll; > + const char *const function; > + FARPROC pfunction; > + unsigned initialized : 1; > +}; > + > +/* Declares a function to be loaded dynamically from a DLL. */ #define > +DECLARE_PROC_ADDR(dll, rettype, function, ...) \ > + static struct proc_addr proc_addr_##function = \ > + { #dll, #function, NULL, 0 }; \ > + static rettype (WINAPI *function)(__VA_ARGS__) > + > +/* > + * Loads a function from a DLL
Re: [RFC PATCH 0/2] Add named reference to latest push cert
On Mon, Sep 18, 2017 at 7:22 AM, Santiago Torreswrote: > Hello, everyone. > > Sorry for being late in this thread, I was making sure I didn't say > anything outrageously wrong. > >> That's Stefan; I wouldn't have suggested any approach that uses the >> blob whose sole purpose is to serve as a temporary storage area to >> pass the information to the hook as part of the permanent record. >> I put out a vague design that seemed to have more advantages in my mind at the time of writing. :) > > Hmm, as far as I understand *this* is the status quo. We get an > ephemeral sha1/oid only if you have a hook attached. Otherwise, you will > never see the object at all, even if you have --signed set. > > I suggested preparing this RFC/Patch because of the following reasons > (please let me know if my understanding of any of this is wrong...): > > - I find it weird that the cli allows for a --signed push and > nowhere in the receive-pack's feedback you're told if the push > certificate is compute/stored/handled at all. I think that, at the > latest, receive pack should let the user know whether the signed > push succeeded, or if there is no hook attached to handle it. How would a user benefit from it? (Are there cases where the organisation wants the user to not know deliberately what happened to their push cert? Do we care about these cases?) > - *if there is a hook* the blob is computed, but it is up to the > hook itself to store it *somewhere*. This makes me feel like it's > somewhat of a useless waste of computation if the hook is not > meant to handle it anyway(which is just a post-receive hook). I > find it rather weird that --signed is a builtin flag, and is > handled on the server side only partially (just my two cents). I agree, but many features in Git start out small and only partially. > - To me, the way push certificates are handled now feels like having > git commit -S producing a detached signature that you have to > embed somehow in the resulting commit object. (As a matter of > fact, many points on [1] seem to back this notion, and even recall > some drawbacks on push certificates the way they are handled > today) > > I understand the concurrency concerns, so I agree with stefan's > solution, although I don't know how big of a deal it would be, if git > already supports --atomic pushes (admittedly, I haven't checked if there > are any guards in place for someone who pushes millions of refs > atomically). It'd be completely understandable to have a setting to > disable hadnling of --signed pushes and, ideally, a way to warn the user > of this feature being disabled if --signed is sent. That makes sense.
Re: [PATCH] pack: make packed_git_mru global a value instead of a pointer
On Sun, Sep 17, 2017 at 1:44 PM, Christian Couderwrote: > On Sun, Sep 17, 2017 at 10:42 PM, Christian Couder > wrote: >> On Sun, Sep 17, 2017 at 10:17 PM, phionah bugosi wrote: >>> Signed-off-by: phionah bugosi >>> --- >>> builtin/pack-objects.c | 5 +++-- >>> cache.h| 7 --- >>> list.h | 6 ++ >>> packfile.c | 12 ++-- >>> 4 files changed, 15 insertions(+), 15 deletions(-) >>> >>> This patch makes packed_git_mru global a value instead of a pointer and >>> makes use of list.h >> >> Please explain _why_ you are doing that (why is the resulting code better). > > Also the explanations should be in the commit message, that is above > your "Signed-off-by: ..." line. A similar patch exists at origin/jn/per-repo-object-store-fixes^^ (I haven't checked if there are differences and if so which patch is better, all I am noting here, is that there has been work very similar to this one a couple days prior)
[PATCH v2] Improve performance of git status --ignored
Improve the performance of the directory listing logic when it wants to list non-empty ignored directories. In order to show non-empty ignored directories, the existing logic will recursively iterate through all contents of an ignored directory. This change introduces the optimization to stop iterating through the contents once it finds the first file. This can have a significant improvement in 'git status --ignored' performance in repositories with a large number of files in ignored directories. For an example of the performance difference on an example repository with 196,000 files in 400 ignored directories: | Command| Time (s) | | -- | - | | git status | 1.2 | | git status --ignored (old) | 3.9 | | git status --ignored (new) | 1.4 | Signed-off-by: Jameson Miller--- dir.c | 47 +-- 1 file changed, 41 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index 1c55dc3..1d17b80 100644 --- a/dir.c +++ b/dir.c @@ -49,7 +49,7 @@ struct cached_dir { static enum path_treatment read_directory_recursive(struct dir_struct *dir, struct index_state *istate, const char *path, int len, struct untracked_cache_dir *untracked, - int check_only, const struct pathspec *pathspec); + int check_only, int stop_at_first_file, const struct pathspec *pathspec); static int get_dtype(struct dirent *de, struct index_state *istate, const char *path, int len); @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + + /* +* If this is an excluded directory, then we only need to check if +* the directory contains any files. +*/ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, pathspec); + untracked, 1, exclude, pathspec); } /* @@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, * with check_only set. */ return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, pathspec); + cdir->ucd, 1, 0, pathspec); /* * We get path_recurse in the first run when * directory_exists_in_index() returns index_nonexistent. We @@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir) * Also, we ignore the name ".git" (even if it is not a directory). * That likely will not change. * + * If 'stop_at_first_file' is specified, 'path_excluded' is returned + * to signal that a file was found. This is the least significant value that + * indicates that a file was encountered that does not depend on the order of + * whether an untracked or exluded path was encountered first. + * * Returns the most significant path_treatment value encountered in the scan. + * If 'stop_at_first_file' is specified, `path_excluded` is the most + * significant path_treatment value that will be returned. */ + static enum path_treatment read_directory_recursive(struct dir_struct *dir, struct index_state *istate, const char *base, int baselen, struct untracked_cache_dir *untracked, int check_only, - const struct pathspec *pathspec) + int stop_at_first_file, const struct pathspec *pathspec) { struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; @@ -1832,12 +1845,34 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, subdir_state = read_directory_recursive(dir, istate, path.buf, path.len, ud, -check_only, pathspec); +check_only, stop_at_first_file, pathspec); if (subdir_state > dir_state) dir_state = subdir_state; } if (check_only) { + if (stop_at_first_file) { + /* +* If stopping at first file, then +* signal that a file was found by +* returning `path_excluded`. This is +* to return a consistent value +* regardless of whether an ignored or +* excluded file happened to be +* encountered 1st. +
[PATCH v2] Improve performance of git status --ignored
This is the second version of my patches to improve handling of ignored files I have decided to break the original patch series into two parts: 1) Perf improvements to handling ignored directories 2) Expose extra options to control which ignored files are displayed by git status. This patch will address #1, and I will follow up with another patch for #2. This patch improves the performance of 'git status --ignored' by cutting out unnecessary work when determining if an ignored directory is empty or not. The current logic will recursively enumerate through all contents of an ignored directory to determine whether it is empty or not. The new logic will return after the 1st file is encountered. This can result in a significant speedup in work dirs with a lot of files in ignored directories. As an example of the performance improvement, here is a representative example of a repository with ~190,000 files in ~400 ignored directories: | Command| Time (s) | | -- | | | git status |1.2 | | git status --ignored (old) |3.9 | | git status --ignored (new) |1.4 | Jameson Miller (1): Improve performance of git status --ignored dir.c | 47 +-- 1 file changed, 41 insertions(+), 6 deletions(-) -- 2.7.4
[OUTREACHY] pack: make packed_git_mru global a value instead of a pointer
Just to reecho a previous change requested before in one of the mail threads, we currently have two global variables declared: struct mru packed_git_mru_storage; struct mru *packed_git_mru = _git_mru_storage; We normally use pointers in C to point or refer to the same location or space in memory from multiple places. That means in situations where we will have the pointer be assigned to in many places in our code. But in this case, we do not have any other logic refering or assigning to the pointer packed_git_mru. In simple words we actually dont need a pointer in this case. Therefore this patch makes packed_git_mru global a value instead of a pointer and makes use of list.h Signed-off-by: phionah bugosi--- builtin/pack-objects.c | 5 +++-- cache.h| 7 --- list.h | 6 ++ packfile.c | 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a57b4f0..189123f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "list.h" #include "config.h" #include "attr.h" #include "object.h" @@ -1012,7 +1013,7 @@ static int want_object_in_pack(const unsigned char *sha1, return want; } - for (entry = packed_git_mru->head; entry; entry = entry->next) { + for (entry = packed_git_mru.head; entry; entry = entry->next) { struct packed_git *p = entry->item; off_t offset; @@ -1030,7 +1031,7 @@ static int want_object_in_pack(const unsigned char *sha1, } want = want_found_object(exclude, p); if (!exclude && want > 0) - mru_mark(packed_git_mru, entry); + mru_mark(_git_mru, entry); if (want != -1) return want; } diff --git a/cache.h b/cache.h index a916bc7..c8d7086 100644 --- a/cache.h +++ b/cache.h @@ -1585,13 +1585,6 @@ extern struct packed_git { char pack_name[FLEX_ARRAY]; /* more */ } *packed_git; -/* - * A most-recently-used ordered version of the packed_git list, which can - * be iterated instead of packed_git (and marked via mru_mark). - */ -struct mru; -extern struct mru *packed_git_mru; - struct pack_entry { off_t offset; unsigned char sha1[20]; diff --git a/list.h b/list.h index a226a87..525703b 100644 --- a/list.h +++ b/list.h @@ -26,6 +26,12 @@ #define LIST_H 1 /* + * A most-recently-used ordered version of the packed_git list, which can + * be iterated instead of packed_git (and marked via mru_mark). + */ +extern struct mru packed_git_mru + +/* * The definitions of this file are adopted from those which can be * found in the Linux kernel headers to enable people familiar with the * latter find their way in these sources as well. diff --git a/packfile.c b/packfile.c index f86fa05..61a61aa 100644 --- a/packfile.c +++ b/packfile.c @@ -1,4 +1,5 @@ #include "cache.h" +#include "list.h" #include "mru.h" #include "pack.h" #include "dir.h" @@ -41,8 +42,7 @@ static size_t peak_pack_mapped; static size_t pack_mapped; struct packed_git *packed_git; -static struct mru packed_git_mru_storage; -struct mru *packed_git_mru = _git_mru_storage; +struct mru packed_git_mru; #define SZ_FMT PRIuMAX static inline uintmax_t sz_fmt(size_t s) { return s; } @@ -861,9 +861,9 @@ static void prepare_packed_git_mru(void) { struct packed_git *p; - mru_clear(packed_git_mru); + mru_clear(_git_mru); for (p = packed_git; p; p = p->next) - mru_append(packed_git_mru, p); + mru_append(_git_mru, p); } static int prepare_packed_git_run_once = 0; @@ -1832,9 +1832,9 @@ int find_pack_entry(const unsigned char *sha1, struct pack_entry *e) if (!packed_git) return 0; - for (p = packed_git_mru->head; p; p = p->next) { + for (p = packed_git_mru.head; p; p = p->next) { if (fill_pack_entry(sha1, e, p->item)) { - mru_mark(packed_git_mru, p); + mru_mark(_git_mru, p); return 1; } } -- 2.7.4
[PATCH] rev-parse: rev-parse: add --is-shallow-repository
Running `git fetch --unshallow` on a repo that is not in fact shallow produces a fatal error message. Add a helper to rev-parse that scripters can use to determine whether a repo is shallow or not. Signed-off-by: Øystein Walle--- Documentation/git-rev-parse.txt | 3 +++ builtin/rev-parse.c | 5 + t/t1500-rev-parse.sh| 15 +++ 3 files changed, 23 insertions(+) diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt index b1293f24b..0917b8207 100644 --- a/Documentation/git-rev-parse.txt +++ b/Documentation/git-rev-parse.txt @@ -235,6 +235,9 @@ print a message to stderr and exit with nonzero status. --is-bare-repository:: When the repository is bare print "true", otherwise "false". +--is-shallow-repository:: + When the repository is shallow print "true", otherwise "false". + --resolve-git-dir :: Check if is a valid repository or a gitfile that points at a valid repository, and print the location of the diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index 2bd28d3c0..c923207f2 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) : "false"); continue; } + if (!strcmp(arg, "--is-shallow-repository")) { + printf("%s\n", is_repository_shallow() ? "true" + : "false"); + continue; + } if (!strcmp(arg, "--shared-index-path")) { if (read_cache() < 0) die(_("Could not read the index")); diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh index 03d3c7f6d..9d3433a30 100755 --- a/t/t1500-rev-parse.sh +++ b/t/t1500-rev-parse.sh @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' ' test_cmp expect actual ' +test_expect_success 'git-path shallow repository' ' + test_commit test_commit && + echo true >expect && + git clone --depth 1 --no-local . shallow && + test_when_finished "rm -rf shallow" && + git -C shallow rev-parse --is-shallow-repository >actual && + test_cmp expect actual +' + +test_expect_success 'git-path notshallow repository' ' + echo false >expect && + git rev-parse --is-shallow-repository >actual && + test_cmp expect actual +' + test_expect_success 'showing the superproject correctly' ' git rev-parse --show-superproject-working-tree >out && test_must_be_empty out && -- 2.11.0.485.g4e59582
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
On 09/13, Stefan Beller wrote: > On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williamswrote: > > Create protocol.{c,h} and provide functions which future servers and > > clients can use to determine which protocol to use or is being used. > > > > Also introduce the 'GIT_PROTOCOL' environment variable which will be > > used to communicate a colon separated list of keys with optional values > > to a server. Unknown keys and values must be tolerated. This mechanism > > is used to communicate which version of the wire protocol a client would > > like to use with a server. > > > > Signed-off-by: Brandon Williams > > --- > > Documentation/config.txt | 16 +++ > > Documentation/git.txt| 5 > > Makefile | 1 + > > cache.h | 7 + > > protocol.c | 72 > > > > protocol.h | 15 ++ > > 6 files changed, 116 insertions(+) > > create mode 100644 protocol.c > > create mode 100644 protocol.h > > > > diff --git a/Documentation/config.txt b/Documentation/config.txt > > index dc4e3f58a..d5b28a32c 100644 > > --- a/Documentation/config.txt > > +++ b/Documentation/config.txt > > @@ -2517,6 +2517,22 @@ The protocol names currently used by git are: > > `hg` to allow the `git-remote-hg` helper) > > -- > > > > +protocol.version:: > > It would be cool to set a set of versions that are good. (I am not sure > if that can be deferred to a later patch.) > > Consider we'd have versions 0,1,2,3,4 in the future: > In an ideal world the client and server would talk using v4 > as it is the most advanced protocol, right? > Maybe a security/performance issue is found on the server side > with say protocol v3. Still no big deal as we are speaking v4. > But then consider an issue is found on the client side with v4. > Then the client would happily talk 0..3 while the server would > love to talk using 0,1,2,4. > > The way I think about protocol version negotiation is that > both parties involved have a set of versions that they tolerate > to talk (they might understand more than the tolerated set, but the > user forbade some), and the goal of the negotiation is to find > the highest version number that is part of both the server set > and client set. So quite naturally with this line of thinking the > configuration is to configure a set of versions, which is what > I propose here. Maybe even in the wire format, separated > with colons? I'm sure it wouldn't take too much to change this to be a multi-valued config. Because after this series there is just v0 and v1 I didn't think through this case too much. If others agree then I can go ahead and make it so in a reroll. > > > + If set, clients will attempt to communicate with a server using > > + the specified protocol version. If unset, no attempt will be > > + made by the client to communicate using a particular protocol > > + version, this results in protocol version 0 being used. > > This sounds as if we're going to be really shy at first and only > users that care will try out new versions at their own risk. > From a users POV this may be frustrating as I would imagine that > people want to run > > git config --global protocol.version 2 > > to try it out and then realize that some of their hosts do not speak > 2, so they have to actually configure it per repo/remote. The point would be to be able to set this globally, not per-repo. Even if a repo doesn't speak v2 then it should be able to gracefully degrade to v1 without the user having to do anything. The reason why there is this escape hatch is if doing the protocol negotiation out of band causing issues with communicating with a server that it can be shut off. > > + Supported versions: > > > +* `0` - the original wire protocol. > > In the future this may be misleading as it doesn't specify the date of > when it was original. e.g. are capabilities already supported in "original"? > > Maybe phrase it as "wire protocol as of v2.14" ? (Though this sounds > as if new capabilities added in the future are not allowed) Yeah I can see how this could be misleading, though I'm not sure how best to word it to avoid that. > > > > + > > +extern enum protocol_version parse_protocol_version(const char *value); > > +extern enum protocol_version get_protocol_version_config(void); > > +extern enum protocol_version determine_protocol_version_server(void); > > +extern enum protocol_version determine_protocol_version_client(const char > > *server_response); > > Here is a good place to have some comments. -- Brandon Williams
Re: [PATCH 3/8] daemon: recognize hidden request arguments
On 09/13, Stefan Beller wrote: > On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williamswrote: > > A normal request to git-daemon is structured as > > "command path/to/repo\0host=..\0" and due to a bug in an old version of > > git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the > > command, 2009-06-04) we aren't able to place any extra args (separated > > by NULs) besides the host. > > > > In order to get around this limitation teach git-daemon to recognize > > additional request arguments hidden behind a second NUL byte. Requests > > can then be structured like: > > "command path/to/repo\0host=..\0\0version=1\0key=value\0". git-daemon > > can then parse out the extra arguments and set 'GIT_PROTOCOL' > > accordingly. > > > > Signed-off-by: Brandon Williams > > --- > > daemon.c | 71 > > +++- > > 1 file changed, 61 insertions(+), 10 deletions(-) > > > > diff --git a/daemon.c b/daemon.c > > index 30747075f..250dbf82c 100644 > > --- a/daemon.c > > +++ b/daemon.c > > @@ -282,7 +282,7 @@ static const char *path_ok(const char *directory, > > struct hostinfo *hi) > > return NULL;/* Fallthrough. Deny by default */ > > } > > > > -typedef int (*daemon_service_fn)(void); > > +typedef int (*daemon_service_fn)(const struct argv_array *env); > > struct daemon_service { > > const char *name; > > const char *config_name; > > @@ -363,7 +363,7 @@ static int run_access_hook(struct daemon_service > > *service, const char *dir, > > } > > > > static int run_service(const char *dir, struct daemon_service *service, > > - struct hostinfo *hi) > > + struct hostinfo *hi, const struct argv_array *env) > > { > > const char *path; > > int enabled = service->enabled; > > @@ -422,7 +422,7 @@ static int run_service(const char *dir, struct > > daemon_service *service, > > */ > > signal(SIGTERM, SIG_IGN); > > > > - return service->fn(); > > + return service->fn(env); > > } > > > > static void copy_to_log(int fd) > > @@ -462,25 +462,34 @@ static int run_service_command(struct child_process > > *cld) > > return finish_command(cld); > > } > > > > -static int upload_pack(void) > > +static int upload_pack(const struct argv_array *env) > > { > > struct child_process cld = CHILD_PROCESS_INIT; > > argv_array_pushl(, "upload-pack", "--strict", NULL); > > argv_array_pushf(, "--timeout=%u", timeout); > > + > > + argv_array_pushv(_array, env->argv); > > + > > return run_service_command(); > > } > > > > -static int upload_archive(void) > > +static int upload_archive(const struct argv_array *env) > > { > > struct child_process cld = CHILD_PROCESS_INIT; > > argv_array_push(, "upload-archive"); > > + > > + argv_array_pushv(_array, env->argv); > > + > > return run_service_command(); > > } > > > > -static int receive_pack(void) > > +static int receive_pack(const struct argv_array *env) > > { > > struct child_process cld = CHILD_PROCESS_INIT; > > argv_array_push(, "receive-pack"); > > + > > + argv_array_pushv(_array, env->argv); > > + > > return run_service_command(); > > } > > > > @@ -574,7 +583,7 @@ static void canonicalize_client(struct strbuf *out, > > const char *in) > > /* > > * Read the host as supplied by the client connection. > > */ > > -static void parse_host_arg(struct hostinfo *hi, char *extra_args, int > > buflen) > > +static char *parse_host_arg(struct hostinfo *hi, char *extra_args, int > > buflen) > > { > > char *val; > > int vallen; > > @@ -602,6 +611,39 @@ static void parse_host_arg(struct hostinfo *hi, char > > *extra_args, int buflen) > > if (extra_args < end && *extra_args) > > die("Invalid request"); > > } > > + > > + return extra_args; > > +} > > + > > +static void parse_extra_args(struct argv_array *env, const char > > *extra_args, > > +int buflen) > > +{ > > + const char *end = extra_args + buflen; > > + struct strbuf git_protocol = STRBUF_INIT; > > + > > + for (; extra_args < end; extra_args += strlen(extra_args) + 1) { > > + const char *arg = extra_args; > > + > > + /* > > +* Parse the extra arguments, adding most to 'git_protocol' > > +* which will be used to set the 'GIT_PROTOCOL' envvar in > > the > > +* service that will be run. > > +* > > +* If there ends up being a particular arg in the future > > that > > +* git-daemon needs to parse specificly (like the 'host' > > arg) > > +* then it can be parsed here and not added to > > 'git_protocol'. > > +*/ > > + if (*arg) { > >
Re: [PATCH 2/4] push, fetch: error out for submodule entries not pointing to commits
On 09/12, Jonathan Nieder wrote: > From: Stefan Beller> > The check_has_commit helper uses resolves a submodule entry to a > commit, when validating its existence. As a side effect this means > tolerates a submodule entry pointing to a tag, which is not a valid > submodule entry that git commands would know how to cope with. > > Tighten the check to require an actual commit, not a tag pointing to a > commit. > > Also improve the error handling when a submodule entry points to > non-commit (e.g., a blob) to error out instead of warning and > pretending the pointed to object doesn't exist. > > Signed-off-by: Stefan Beller > Signed-off-by: Jonathan Nieder Looks good! > --- > submodule.c| 33 + > t/t5531-deep-submodule-push.sh | 10 ++ > 2 files changed, 35 insertions(+), 8 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 3cea8221e0..e0da55920d 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -767,19 +767,36 @@ static int append_oid_to_argv(const struct object_id > *oid, void *data) > return 0; > } > > +struct has_commit_data { > + int result; > + const char *path; > +}; > + > static int check_has_commit(const struct object_id *oid, void *data) > { > - int *has_commit = data; > + struct has_commit_data *cb = data; > > - if (!lookup_commit_reference(oid)) > - *has_commit = 0; > + enum object_type type = sha1_object_info(oid->hash, NULL); > > - return 0; > + switch (type) { > + case OBJ_COMMIT: > + return 0; > + case OBJ_BAD: > + /* > + * Object is missing or invalid. If invalid, an error message > + * has already been printed. > + */ > + cb->result = 0; > + return 0; > + default: > + die(_("submodule entry '%s' (%s) is a %s, not a commit"), > + cb->path, oid_to_hex(oid), typename(type)); > + } > } > > static int submodule_has_commits(const char *path, struct oid_array *commits) > { > - int has_commit = 1; > + struct has_commit_data has_commit = { 1, path }; > > /* >* Perform a cheap, but incorrect check for the existence of 'commits'. > @@ -795,7 +812,7 @@ static int submodule_has_commits(const char *path, struct > oid_array *commits) > > oid_array_for_each_unique(commits, check_has_commit, _commit); > > - if (has_commit) { > + if (has_commit.result) { > /* >* Even if the submodule is checked out and the commit is >* present, make sure it exists in the submodule's object store > @@ -814,12 +831,12 @@ static int submodule_has_commits(const char *path, > struct oid_array *commits) > cp.dir = path; > > if (capture_command(, , GIT_MAX_HEXSZ + 1) || out.len) > - has_commit = 0; > + has_commit.result = 0; > > strbuf_release(); > } > > - return has_commit; > + return has_commit.result; > } > > static int submodule_needs_pushing(const char *path, struct oid_array > *commits) > diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh > index 0f84a53146..39cb2c1c34 100755 > --- a/t/t5531-deep-submodule-push.sh > +++ b/t/t5531-deep-submodule-push.sh > @@ -298,6 +298,16 @@ test_expect_success 'push succeeds if submodule commit > disabling recursion from > ) > ' > > +test_expect_success 'submodule entry pointing at a tag is error' ' > + git -C work/gar/bage tag -a test1 -m "tag" && > + tag=$(git -C work/gar/bage rev-parse test1^{tag}) && > + git -C work update-index --cacheinfo 16 "$tag" gar/bage && > + git -C work commit -m "bad commit" && > + test_when_finished "git -C work reset --hard HEAD^" && > + test_must_fail git -C work push --recurse-submodules=on-demand > ../pub.git master 2>err && > + test_i18ngrep "is a tag, not a commit" err > +' > + > test_expect_success 'push fails if recurse submodules option passed as yes' ' > ( > cd work/gar/bage && > -- > 2.14.1.690.gbb1197296e > -- Brandon Williams
Re: [RFC PATCH v2 1/2] implement fetching of moved submodules
On 09/15, Heiko Voigt wrote: > We store the changed submodules paths to calculate which submodule needs > fetching. This does not work for moved submodules since their paths do > not stay the same in case of a moved submodules. In case of new > submodules we do not have a path in the current checkout, since they > just appeared in this fetch. > > It is more general to collect the submodule names for changes instead of > their paths to include the above cases. > > With the change described above we implement 'on-demand' fetching of > changes in moved submodules. > > Note: This does only work when repositories have a .gitmodules file. In > other words: It breaks if we do not get a name for a repository. > IIRC, consensus was that this is a requirement to get nice submodule > handling these days? > > NEEDSWORK: This breaks t5531 and t5545 because they do not use a > .gitmodules file. I will add a fallback to paths to help such users. > > Signed-off-by: Heiko Voigt> --- > This an update of the previous series[1] to the current master. The > fallback is still missing but now it should not conflict with any topics > in flight anymore (hopefully). So the idea is to collect changed submodule's name, instead of their path, so that if they happened to moved you don't have to worry about the path changing underneath you. This should be good once those tests get fixed. Thanks for working on cleaning this up! :) > > Cheers Heiko > > [1] https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/ > > submodule.c | 91 > + > t/t5526-fetch-submodules.sh | 35 + > 2 files changed, 85 insertions(+), 41 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 3cea8221e0..38b9905e43 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -21,7 +21,7 @@ > #include "parse-options.h" > > static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; > -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP; > +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP; > static int initialized_fetch_ref_tips; > static struct oid_array ref_tips_before_fetch; > static struct oid_array ref_tips_after_fetch; > @@ -667,11 +667,11 @@ const struct submodule *submodule_from_ce(const struct > cache_entry *ce) > } > > static struct oid_array *submodule_commits(struct string_list *submodules, > -const char *path) > +const char *name) > { > struct string_list_item *item; > > - item = string_list_insert(submodules, path); > + item = string_list_insert(submodules, name); > if (item->util) > return (struct oid_array *) item->util; > > @@ -680,39 +680,34 @@ static struct oid_array *submodule_commits(struct > string_list *submodules, > return (struct oid_array *) item->util; > } > > +struct collect_changed_submodules_cb_data { > + struct string_list *changed; > + const struct object_id *commit_oid; > +}; > + > static void collect_changed_submodules_cb(struct diff_queue_struct *q, > struct diff_options *options, > void *data) > { > + struct collect_changed_submodules_cb_data *me = data; > + struct string_list *changed = me->changed; > + const struct object_id *commit_oid = me->commit_oid; > int i; > - struct string_list *changed = data; > > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; > struct oid_array *commits; > + const struct submodule *submodule; > + > if (!S_ISGITLINK(p->two->mode)) > continue; > > - if (S_ISGITLINK(p->one->mode)) { > - /* > - * NEEDSWORK: We should honor the name configured in > - * the .gitmodules file of the commit we are examining > - * here to be able to correctly follow submodules > - * being moved around. > - */ > - commits = submodule_commits(changed, p->two->path); > - oid_array_append(commits, >two->oid); > - } else { > - /* Submodule is new or was moved here */ > - /* > - * NEEDSWORK: When the .git directories of submodules > - * live inside the superprojects .git directory some > - * day we should fetch new submodules directly into > - * that location too when config or options request > - * that so they can be checked out from there. > - */ > + submodule = submodule_from_path(commit_oid, p->two->path); > + if
Re: [RFC PATCH v2 2/2] submodule: simplify decision tree whether to or not to fetch
On 09/15, Heiko Voigt wrote: > To make extending this logic later easier. > > Signed-off-by: Heiko VoigtI like the result of this patch, its much cleaner. > --- > submodule.c | 74 > ++--- > 1 file changed, 37 insertions(+), 37 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 38b9905e43..fa44fc59f2 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1118,6 +1118,31 @@ struct submodule_parallel_fetch { > }; > #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0} > > +static int get_fetch_recurse_config(const struct submodule *submodule, > + struct submodule_parallel_fetch *spf) > +{ > + if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT) > + return spf->command_line_option; > + > + if (submodule) { > + char *key; > + const char *value; > + > + int fetch_recurse = submodule->fetch_recurse; > + key = xstrfmt("submodule.%s.fetchRecurseSubmodules", > submodule->name); > + if (!repo_config_get_string_const(the_repository, key, )) > { > + fetch_recurse = parse_fetch_recurse_submodules_arg(key, > value); > + } > + free(key); > + > + if (fetch_recurse != RECURSE_SUBMODULES_NONE) > + /* local config overrules everything except commandline > */ > + return fetch_recurse; > + } > + > + return spf->default_option; > +} > + > static int get_next_submodule(struct child_process *cp, > struct strbuf *err, void *data, void **task_cb) > { > @@ -1137,46 +1162,21 @@ static int get_next_submodule(struct child_process > *cp, > > submodule = submodule_from_path(_oid, ce->name); > > - default_argv = "yes"; > - if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) { > - int fetch_recurse = RECURSE_SUBMODULES_NONE; > - > - if (submodule) { > - char *key; > - const char *value; > - > - fetch_recurse = submodule->fetch_recurse; > - key = > xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name); > - if > (!repo_config_get_string_const(the_repository, key, )) { > - fetch_recurse = > parse_fetch_recurse_submodules_arg(key, value); > - } > - free(key); > - } > - > - if (fetch_recurse != RECURSE_SUBMODULES_NONE) { > - if (fetch_recurse == RECURSE_SUBMODULES_OFF) > - continue; > - if (fetch_recurse == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(_submodule_names, > - > submodule->name)) > - continue; > - default_argv = "on-demand"; > - } > - } else { > - if (spf->default_option == > RECURSE_SUBMODULES_OFF) > - continue; > - if (spf->default_option == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(_submodule_names, > - > submodule->name)) > - continue; > - default_argv = "on-demand"; > - } > - } > - } else if (spf->command_line_option == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(_submodule_names, > + switch (get_fetch_recurse_config(submodule, spf)) > + { > + default: > + case RECURSE_SUBMODULES_DEFAULT: > + case RECURSE_SUBMODULES_ON_DEMAND: > + if (!submodule || > !unsorted_string_list_lookup(_submodule_names, >submodule->name)) > continue; > default_argv = "on-demand"; > + break; > + case RECURSE_SUBMODULES_ON: > + default_argv = "yes"; > + break; > + case RECURSE_SUBMODULES_OFF: > + continue; > } > > strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name); > -- > 2.14.1.145.gb3622a4 > -- Brandon Williams
Re: [PATCH] protocol: make parse_protocol_version() private
On 09/17, Ramsay Jones wrote: > > Signed-off-by: Ramsay Jones> --- > > Hi Brandon, > > If you need to re-roll your 'bw/protocol-v1' branch, could you please > squash this into the relevant patch (commit 45954f179e, "protocol: > introduce protocol extention mechanisms", 13-09-2017). > > This assumes you agree that this symbol does not need to be public; if > not, then please just ignore! ;-) Thanks! I've updated my local version of the series to reflect this. > > Thanks! > > ATB, > Ramsay Jones > > protocol.c | 2 +- > protocol.h | 1 - > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/protocol.c b/protocol.c > index 1b16c7b9a..369503065 100644 > --- a/protocol.c > +++ b/protocol.c > @@ -2,7 +2,7 @@ > #include "config.h" > #include "protocol.h" > > -enum protocol_version parse_protocol_version(const char *value) > +static enum protocol_version parse_protocol_version(const char *value) > { > if (!strcmp(value, "0")) > return protocol_v0; > diff --git a/protocol.h b/protocol.h > index 2fa6486d0..18f9a5235 100644 > --- a/protocol.h > +++ b/protocol.h > @@ -7,7 +7,6 @@ enum protocol_version { > protocol_v1 = 1, > }; > > -extern enum protocol_version parse_protocol_version(const char *value); > extern enum protocol_version get_protocol_version_config(void); > extern enum protocol_version determine_protocol_version_server(void); > extern enum protocol_version determine_protocol_version_client(const char > *server_response); > -- > 2.14.0 -- Brandon Williams
RE: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
> -Original Message- > From: Torsten Bögershausen [mailto:tbo...@web.de] > Sent: Monday, September 18, 2017 11:43 AM > To: Ben Peart; Junio C Hamano > ; Ben Peart > Cc: david.tur...@twosigma.com; ava...@gmail.com; > christian.cou...@gmail.com; git@vger.kernel.org; > johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > Subject: Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index > extension > > On 2017-09-18 15:38, Ben Peart wrote: > > > > > > On 9/17/2017 4:02 AM, Junio C Hamano wrote: > >> Ben Peart writes: > >> > >>> diff --git a/t/helper/test-dump-fsmonitor.c > >>> b/t/helper/test-dump-fsmonitor.c new file mode 100644 index > >>> 00..482d749bb9 > >>> --- /dev/null > >>> +++ b/t/helper/test-dump-fsmonitor.c > >>> @@ -0,0 +1,21 @@ > >>> +#include "cache.h" > >>> + > >>> +int cmd_main(int ac, const char **av) { > >>> + struct index_state *istate = _index; > >>> + int i; > >>> + > >>> + setup_git_directory(); > >>> + if (do_read_index(istate, get_index_file(), 0) < 0) > >>> + die("unable to read index file"); > >>> + if (!istate->fsmonitor_last_update) { > >>> + printf("no fsmonitor\n"); > >>> + return 0; > >>> + } > >>> + printf("fsmonitor last update %"PRIuMAX"\n", > >>> istate->fsmonitor_last_update); > >> > >> After pushing this out and had Travis complain, I queued a squash on > >> top of this to cast the argument to (uintmax_t), like you did in an > >> earlier step (I think it was [PATCH 04/12]). > >> > > > > Thanks. I'll update this to cast it as (uint64_t) as that is what > > get/put_be64 use. As far as I can tell they both map to the same > > thing (unsigned long long) so there isn't functional difference. > (Just to double-check): This is the way to print "PRIuMAX" correctly (on all > platforms): > > printf("fsmonitor last update %"PRIuMAX"\n", (uintmax_t)istate- > >fsmonitor_last_update); > Should I just make the variable type itself uintmax_t and then just skip the cast altogether? I went with uint64_t because that is what getnanotime returned.
[PATCH 2/2] read_info_alternates: warn on non-trivial errors
When we fail to open $GIT_DIR/info/alternates, we silently assume there are no alternates. This is the right thing to do for ENOENT, but not for other errors. A hard error is probably overkill here. If we fail to read an alternates file then either we'll complete our operation anyway, or we'll fail to find some needed object. Either way, a warning is good idea. And we already have a helper function to handle this pattern; let's just call warn_on_fopen_error(). Note that technically the errno from strbuf_read_file() might be from a read() error, not open(). But since read() would never return ENOENT or ENOTDIR, and since it produces a generic "unable to access" error, it's suitable for handling errors from either. Signed-off-by: Jeff King--- I'm pretty comfortable with the rationale in the final paragraph. But if we're concerned that warn_on_fopen_errors() might change, we can swap out strbuf_read_file() for fopen()+strbuf_read(). sha1_file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sha1_file.c b/sha1_file.c index b1e4193679..9cec326298 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -432,6 +432,7 @@ static void read_info_alternates(const char * relative_base, int depth) path = xstrfmt("%s/info/alternates", relative_base); if (strbuf_read_file(, path, 1024) < 0) { + warn_on_fopen_errors(path); free(path); return; } -- 2.14.1.1014.g252e627ae0
[PATCH 1/2] read_info_alternates: read contents into strbuf
The link_alt_odb_entries() function has always taken a ptr/len pair as input. Until cf3c635210 (alternates: accept double-quoted paths, 2016-12-12), we made a copy of those bytes in a string. But after that commit, we switched to parsing the input left-to-right, and we ignore "len" totally, instead reading until we hit a NUL. This has mostly gone unnoticed for a few reasons: 1. All but one caller passes a NUL-terminated string, with "len" pointing to the NUL. 2. The remaining caller, read_info_alternates(), passes in an mmap'd file. Unless the file is an exact multiple of the page size, it will generally be followed by NUL padding to the end of the page, which just works. The easiest way to demonstrate the problem is to build with: make SANITIZE=address NO_MMAP=Nope test Any test which involves $GIT_DIR/info/alternates will fail, as the mmap emulation (correctly) does not add an extra NUL, and ASAN complains about reading past the end of the buffer. One solution would be to teach link_alt_odb_entries() to respect "len". But it's actually a bit tricky, since we depend on unquote_c_style() under the hood, and it has no ptr/len variant. We could also just make a NUL-terminated copy of the input bytes and operate on that. But since all but one caller already is passing a string, instead let's just fix that caller to provide NUL-terminated input in the first place, by swapping out mmap for strbuf_read_file(). There's no advantage to using mmap on the alternates file. It's not expected to be large (and anyway, we're copying its contents into an in-memory linked list). Nor is using git_open() buying us anything here, since we don't keep the descriptor open for a long period of time. Let's also drop the "len" parameter entirely from link_alt_odb_entries(), since it's completely ignored. That will avoid any new callers re-introducing a similar bug. Reported-by: Michael HaggertySigned-off-by: Jeff King --- I didn't reproduce with the mmap even-page-size thing, but I think it's the same reason we didn't notice the "git log -G" read-past-mmap issues for a long time. Which makes testing with ASAN and NO_MMAP an interesting combination, as it should find out any similar cases (and after this, the whole test suite passes with that configuration). sha1_file.c | 29 + 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 5f71bbac3e..b1e4193679 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -398,7 +398,7 @@ static const char *parse_alt_odb_entry(const char *string, return end; } -static void link_alt_odb_entries(const char *alt, int len, int sep, +static void link_alt_odb_entries(const char *alt, int sep, const char *relative_base, int depth) { struct strbuf objdirbuf = STRBUF_INIT; @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int len, int sep, static void read_info_alternates(const char * relative_base, int depth) { - char *map; - size_t mapsz; - struct stat st; char *path; - int fd; + struct strbuf buf = STRBUF_INIT; path = xstrfmt("%s/info/alternates", relative_base); - fd = git_open(path); - free(path); - if (fd < 0) - return; - if (fstat(fd, ) || (st.st_size == 0)) { - close(fd); + if (strbuf_read_file(, path, 1024) < 0) { + free(path); return; } - mapsz = xsize_t(st.st_size); - map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0); - close(fd); - link_alt_odb_entries(map, mapsz, '\n', relative_base, depth); - - munmap(map, mapsz); + link_alt_odb_entries(buf.buf, '\n', relative_base, depth); + strbuf_release(); } struct alternate_object_database *alloc_alt_odb(const char *dir) @@ -503,7 +492,7 @@ void add_to_alternates_file(const char *reference) if (commit_lock_file(lock)) die_errno("unable to move new alternates file into place"); if (alt_odb_tail) - link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0); + link_alt_odb_entries(reference, '\n', NULL, 0); } free(alts); } @@ -516,7 +505,7 @@ void add_to_alternates_memory(const char *reference) */ prepare_alt_odb(); - link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0); + link_alt_odb_entries(reference, '\n', NULL, 0); } /* @@ -619,7 +608,7 @@ void prepare_alt_odb(void) if (!alt) alt = ""; alt_odb_tail = _odb_list; - link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0); + link_alt_odb_entries(alt, PATH_SEP, NULL, 0); read_info_alternates(get_object_directory(), 0); } -- 2.14.1.1014.g252e627ae0
[PATCH 0/2] fix read past end of array in alternates files
This series fixes a regression in v2.11.1 where we might read past the end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't base the patch on there, for a few reasons: 1. There's a trivial conflict when merging up (because of git_open_noatime() becoming just git_open() in the inerim). 2. The reproduction advice relies on our SANITIZE Makefile knob, which didn't exist back then. 3. The second patch does not apply there because we don't have warn_on_fopen_errors(). Though admittedly it could be applied separately after merging up; it's just a clean-up on top. It does apply on the current "maint". [1/2]: read_info_alternates: read contents into strbuf [2/2]: read_info_alternates: warn on non-trivial errors sha1_file.c | 30 ++ 1 file changed, 10 insertions(+), 20 deletions(-) -Peff
Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
On 2017-09-18 15:38, Ben Peart wrote: > > > On 9/17/2017 4:02 AM, Junio C Hamano wrote: >> Ben Peartwrites: >> >>> diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c >>> new file mode 100644 >>> index 00..482d749bb9 >>> --- /dev/null >>> +++ b/t/helper/test-dump-fsmonitor.c >>> @@ -0,0 +1,21 @@ >>> +#include "cache.h" >>> + >>> +int cmd_main(int ac, const char **av) >>> +{ >>> + struct index_state *istate = _index; >>> + int i; >>> + >>> + setup_git_directory(); >>> + if (do_read_index(istate, get_index_file(), 0) < 0) >>> + die("unable to read index file"); >>> + if (!istate->fsmonitor_last_update) { >>> + printf("no fsmonitor\n"); >>> + return 0; >>> + } >>> + printf("fsmonitor last update %"PRIuMAX"\n", >>> istate->fsmonitor_last_update); >> >> After pushing this out and had Travis complain, I queued a squash on >> top of this to cast the argument to (uintmax_t), like you did in an >> earlier step (I think it was [PATCH 04/12]). >> > > Thanks. I'll update this to cast it as (uint64_t) as that is what get/put_be64 > use. As far as I can tell they both map to the same thing (unsigned long > long) > so there isn't functional difference. (Just to double-check): This is the way to print "PRIuMAX" correctly (on all platforms): printf("fsmonitor last update %"PRIuMAX"\n", (uintmax_t)istate->fsmonitor_last_update);
Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
On 9/17/2017 12:47 AM, Junio C Hamano wrote: Ben Peartwrites: +write_integration_script() { + write_script .git/hooks/fsmonitor-test<<-\EOF + if [ "$#" -ne 2 ]; then + echo "$0: exactly 2 arguments expected" + exit 2 + fi + if [ "$1" != 1 ]; then + echo -e "Unsupported core.fsmonitor hook version.\n" >&2 + exit 1 + fi In addition to "echo -e" thing pointed out earlier, these look somewhat unusual in our shell scripts, relative to what Documentation/CodingGuidelines tells us to do: I'm happy to make these changes. I understand the difficulty of creating a consistent coding style especially after the fact. Copy/paste is usually a developers best friend as it allows you to avoid a lot of errors by reusing existing tested code. One of the times it backfires is when that code doesn't match the current desired coding style. I only point these out to help lend some additional impetus to the effort to formalize the coding style and to provide tooling to handle what should mostly be a mechanical process. IMO, the goal should be to save the maintainer and contributors the cost of having to write up and respond to formatting feedback. :) Some stats on these same coding style errors in the current bash scripts: 298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space) 140 instances of "if \[ .* \]" (not using the preferred "test") 293 instances of "if .*; then" Wouldn't it be great not to have to write up style feedback for when these all get copy/pasted into new scripts? :) - We prefer a space between the function name and the parentheses, and no space inside the parentheses. The opening "{" should also be on the same line. (incorrect) my_function(){ ... (correct) my_function () { ... - We prefer "test" over "[ ... ]". - Do not write control structures on a single line with semicolon. "then" should be on the next line for if statements, and "do" should be on the next line for "while" and "for". (incorrect) if test -f hello; then do this fi (correct) if test -f hello then do this fi diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman new file mode 100755 index 00..aaee5d1fe3 --- /dev/null +++ b/t/t7519/fsmonitor-watchman @@ -0,0 +1,128 @@ +#!/usr/bin/perl + +use strict; +use warnings; +use IPC::Open2; + ... + open (my $fh, ">", ".git/watchman-query.json"); + print $fh "[\"query\", \"$git_work_tree\", { \ + \"since\": $time, \ + \"fields\": [\"name\"], \ + \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", \"exists\"]]] \ + }]"; + close $fh; + + print CHLD_IN "[\"query\", \"$git_work_tree\", { \ + \"since\": $time, \ + \"fields\": [\"name\"], \ + \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", \"exists\"]]] \ + }]"; This look painful to read, write and maintain. IIRC, Perl supports the < I agree! I'm definitely *not* a perl developer so was unaware of this construct. A few minutes with stack overflow and now I can clean this up. +} \ No newline at end of file Oops. Thanks.
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Hi Junio, On Sat, 16 Sep 2017, Junio C Hamano wrote: > And as you alluded to, we may need to see if we can help making it > easier to do the latter when needed. That mistakes it for "Someone Else's Problem". > Johannes Schindelinwrites: > > > That's what you are buying into by having these "What's cooking" mails > > that are in no usable way connected to the original threads. > > For the above reason, I do not think this is a particularly useful > stance to take. I agree, but this is the process you chose to use. > Do you have a concrete suggestion to make these individual entries more > helpful for people who may want go back to the original thread in doing > so? In-reply-to: or References: fields of the "What's cooking" report > would not help. I often have the message IDs that made/helped me make > these individual comments in the description; they alone would not react > to mouse clicks, though. Oh gawd, not even more stuff piled onto the mail format. Please stop. > Having said that, I'd expect that individual contributors have past > messages pertaining to the smaller numbers of their own topics in flight > in their mailbox than the project wide "What's cooking" report covers, > so perhaps an affort to devise such a mechanism may result in an > over-engineering waste nobody finds useful. I dunno. I frequently find myself putting off that What's cooking mail because it simply takes too long to study. It probably tries to serve too many purposes at the same time, and thereby none. In the same vein as "to a hammer, everything looks like a nail": when it comes to project management, a dedicated tool will always beat a general-purpose mailing list. Ciao, Dscho
Re: [PATCH v6 12/12] fsmonitor: add a performance test
Hi Ben, sorry for not catching this earlier: On Fri, 15 Sep 2017, Ben Peart wrote: > [...] > + > +int cmd_dropcaches(void) > +{ > + HANDLE hProcess = GetCurrentProcess(); > + HANDLE hToken; > + int status; > + > + if (!OpenProcessToken(hProcess, TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES, > )) > + return error("Can't open current process token"); > + > + if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1)) > + return error("Can't get SeProfileSingleProcessPrivilege"); > + > + CloseHandle(hToken); > + > + HMODULE ntdll = LoadLibrary("ntdll.dll"); Git's source code still tries to abide by C90, and for simplicity's sake, this extends to the Windows-specific part. Therefore, the `ntdll` variable needs to be declared at the beginning of the function (I do agree that it makes for better code to reduce the scope of variables, but C90 simply did not allow variables to be declared in the middle of functions). I wanted to send a patch address this in the obvious way, but then I encountered these lines: > + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) = > + (DWORD(WINAPI *)(INT, PVOID, ULONG))GetProcAddress(ntdll, > "NtSetSystemInformation"); > + if (!NtSetSystemInformation) > + return error("Can't get function addresses, wrong Windows > version?"); It turns out that we have seen this plenty of times in Git for Windows' fork, so much so that we came up with a nice helper to make this all a bit more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and INIT_PROC_ADDR helpers in compat/win32/lazyload.h. Maybe this would be the perfect excuse to integrate this patch into upstream Git? This would be the patch (you can also cherry-pick it from 25c4dc3a73352e72e995594cf1b4afa46e93d040 in https://github.com/dscho/git): -- snip -- >From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00 2001 From: Johannes SchindelinDate: Tue, 10 Jan 2017 23:14:20 +0100 Subject: [PATCH] Win32: simplify loading of DLL functions Dynamic loading of DLL functions is duplicated in several places in Git for Windows' source code. This patch adds a pair of macros to simplify the process: the DECLARE_PROC_ADDR(, , , ..) macro to be used at the beginning of a code block, and the INIT_PROC_ADDR() macro to call before using the declared function. The return value of the INIT_PROC_ADDR() call has to be checked; If it is NULL, the function was not found in the specified DLL. Example: DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); if (!INIT_PROC_ADDR(CreateHardLinkW)) return error("Could not find CreateHardLinkW() function"; if (!CreateHardLinkW(source, target, NULL)) return error("could not create hardlink from %S to %S", source, target); return 0; Signed-off-by: Karsten Blees Signed-off-by: Johannes Schindelin --- compat/win32/lazyload.h | 44 1 file changed, 44 insertions(+) create mode 100644 compat/win32/lazyload.h diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file mode 100644 index 000..91c10dad2fb --- /dev/null +++ b/compat/win32/lazyload.h @@ -0,0 +1,44 @@ +#ifndef LAZYLOAD_H +#define LAZYLOAD_H + +/* simplify loading of DLL functions */ + +struct proc_addr { + const char *const dll; + const char *const function; + FARPROC pfunction; + unsigned initialized : 1; +}; + +/* Declares a function to be loaded dynamically from a DLL. */ +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \ + static struct proc_addr proc_addr_##function = \ + { #dll, #function, NULL, 0 }; \ + static rettype (WINAPI *function)(__VA_ARGS__) + +/* + * Loads a function from a DLL (once-only). + * Returns non-NULL function pointer on success. + * Returns NULL + errno == ENOSYS on failure. + */ +#define INIT_PROC_ADDR(function) \ + (function = get_proc_addr(_addr_##function)) + +static inline void *get_proc_addr(struct proc_addr *proc) +{ + /* only do this once */ + if (!proc->initialized) { + HANDLE hnd; + proc->initialized = 1; + hnd = LoadLibraryExA(proc->dll, NULL, +LOAD_LIBRARY_SEARCH_SYSTEM32); + if (hnd) + proc->pfunction = GetProcAddress(hnd, proc->function); + } + /* set ENOSYS if DLL or function was not found */ + if (!proc->pfunction) + errno = ENOSYS; + return proc->pfunction; +} + +#endif -- snap -- With this patch, this fixup to your patch would make things compile (you can also cherry-pick d05996fb61027512b8ab31a36c4a7a677dea11bb from my fork): -- snipsnap -- >From
Re: [RFC PATCH 0/2] Add named reference to latest push cert
Hello, everyone. Sorry for being late in this thread, I was making sure I didn't say anything outrageously wrong. > That's Stefan; I wouldn't have suggested any approach that uses the > blob whose sole purpose is to serve as a temporary storage area to > pass the information to the hook as part of the permanent record. > Hmm, as far as I understand *this* is the status quo. We get an ephemeral sha1/oid only if you have a hook attached. Otherwise, you will never see the object at all, even if you have --signed set. I suggested preparing this RFC/Patch because of the following reasons (please let me know if my understanding of any of this is wrong...): - I find it weird that the cli allows for a --signed push and nowhere in the receive-pack's feedback you're told if the push certificate is compute/stored/handled at all. I think that, at the latest, receive pack should let the user know whether the signed push succeeded, or if there is no hook attached to handle it. - *if there is a hook* the blob is computed, but it is up to the hook itself to store it *somewhere*. This makes me feel like it's somewhat of a useless waste of computation if the hook is not meant to handle it anyway(which is just a post-receive hook). I find it rather weird that --signed is a builtin flag, and is handled on the server side only partially (just my two cents). - To me, the way push certificates are handled now feels like having git commit -S producing a detached signature that you have to embed somehow in the resulting commit object. (As a matter of fact, many points on [1] seem to back this notion, and even recall some drawbacks on push certificates the way they are handled today) I understand the concurrency concerns, so I agree with stefan's solution, although I don't know how big of a deal it would be, if git already supports --atomic pushes (admittedly, I haven't checked if there are any guards in place for someone who pushes millions of refs atomically). It'd be completely understandable to have a setting to disable hadnling of --signed pushes and, ideally, a way to warn the user of this feature being disabled if --signed is sent. Maybe I'm missing the bigger picture, but to me it feels that a named ref to the latest push certificate would make it easier to handle/tool/sync around the push certificate solution? Thanks, -Santiago. [1] https://public-inbox.org/git/CAJo=hJvWbjEM9E5AjPHgmQ=eY8xf=Q=xtukeu2ur7auuqea...@mail.gmail.com/ signature.asc Description: PGP signature
Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
On 9/16/2017 11:27 AM, Torsten Bögershausen wrote: On 2017-09-15 21:20, Ben Peart wrote: +if [ "$1" != 1 ] +then + echo -e "Unsupported core.fsmonitor hook version.\n" >&2 + exit 1 +fi The echo -e not portable (It was detected by a tighter version of the lint script, which I have here, but not yet send to the list :-( This will do: echo "Unsupported core.fsmonitor hook version." >&2 Thanks, I'll fix these and the ones in the t/t7519 directory.
Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
On 9/18/2017 9:32 AM, David Turner wrote: -Original Message- From: Ben Peart [mailto:peart...@gmail.com] Sent: Monday, September 18, 2017 9:07 AM To: David Turner; 'Ben Peart' Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. Thanks for taking the time to review/provide feedback! On 9/15/2017 5:35 PM, David Turner wrote: -Original Message- From: Ben Peart [mailto:benpe...@microsoft.com] Sent: Friday, September 15, 2017 3:21 PM To: benpe...@microsoft.com Cc: David Turner ; ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. +int git_config_get_fsmonitor(void) +{ + if (git_config_get_pathname("core.fsmonitor", _fsmonitor)) + core_fsmonitor = getenv("GIT_FSMONITOR_TEST"); + + if (core_fsmonitor && !*core_fsmonitor) + core_fsmonitor = NULL; + + if (core_fsmonitor) + return 1; + + return 0; +} This functions return values are backwards relative to the rest of the git_config_* functions. I'm confused. If core.fsmonitor is configured, it returns 1. If it is not configured, it returns 0. I don't make use of the -1 /* default value */ option as I didn't see any use/value in this case. What is backwards? The other git_config_* functions return 1 for error and 0 for success. Hmm, I followed the model (ie copy/paste :)) used by the tracked cache. If you look at how it uses, the return value, it is 0 == false == remove the extension, 1 == true == add the extension. I'm doing the same with fsmonitor. static void tweak_untracked_cache(struct index_state *istate) { switch (git_config_get_untracked_cache()) { case -1: /* keep: do nothing */ break; case 0: /* false */ remove_untracked_cache(istate); break; case 1: /* true */ add_untracked_cache(istate); break; default: /* unknown value: do nothing */ break; } } void tweak_fsmonitor(struct index_state *istate) { switch (git_config_get_fsmonitor()) { case -1: /* keep: do nothing */ break; case 0: /* false */ remove_fsmonitor(istate); break; case 1: /* true */ add_fsmonitor(istate); break; default: /* unknown value: do nothing */ break; } } [snip] +> /* +>* With fsmonitor, we can trust the untracked cache's valid field. +>*/ Did you intend to make a comment here? Sorry. I was going to make a comment that I didn't see how that could work since we weren't touching the untracked cache here, but then I saw the bit further down. I'm still not sure it works (see comment on 10/12), but at least it could in theory work. The logic here assumes that when the index is written out, it is valid including the untracked cache and the CE_FSMONITOR_VALID bits. Therefore it should still all be valid as of the time the fsmonitor was queried and the index saved. Another way of thinking about this is that the fsmonitor extension is simply adding another persisted bit on each index entry. It just gets persisted/restored differently than the other persisted bits. Obviously, before we can use it assuming it reflects the *current* state of the working directory, we have to refresh the bits via the refresh logic.
Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
On 9/17/2017 4:02 AM, Junio C Hamano wrote: Ben Peartwrites: diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c new file mode 100644 index 00..482d749bb9 --- /dev/null +++ b/t/helper/test-dump-fsmonitor.c @@ -0,0 +1,21 @@ +#include "cache.h" + +int cmd_main(int ac, const char **av) +{ + struct index_state *istate = _index; + int i; + + setup_git_directory(); + if (do_read_index(istate, get_index_file(), 0) < 0) + die("unable to read index file"); + if (!istate->fsmonitor_last_update) { + printf("no fsmonitor\n"); + return 0; + } + printf("fsmonitor last update %"PRIuMAX"\n", istate->fsmonitor_last_update); After pushing this out and had Travis complain, I queued a squash on top of this to cast the argument to (uintmax_t), like you did in an earlier step (I think it was [PATCH 04/12]). Thanks. I'll update this to cast it as (uint64_t) as that is what get/put_be64 use. As far as I can tell they both map to the same thing (unsigned long long) so there isn't functional difference.
Re: [PATCH] t2018: remove unwanted empty line
On Monday 18 September 2017 12:52 AM, Kevin Daudt wrote: Why is this empty line unwanted? This kind of whitespace can help separate logical sections, just like paragraphs would. You're right. I seem to have sent a fix precariously because I haven't such separation before (forgetting the fact that git has contributors whose way of writing test vary diversly). Should better stop back and think the next time rather than spamming the list. Sorry, for this. --- Kaartic
RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
> -Original Message- > From: Ben Peart [mailto:peart...@gmail.com] > Sent: Monday, September 18, 2017 9:07 AM > To: David Turner; 'Ben Peart' > > Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; > gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; > p...@peff.net > Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a > file > system monitor to speed up detecting new or changed files. > > Thanks for taking the time to review/provide feedback! > > On 9/15/2017 5:35 PM, David Turner wrote: > >> -Original Message- > >> From: Ben Peart [mailto:benpe...@microsoft.com] > >> Sent: Friday, September 15, 2017 3:21 PM > >> To: benpe...@microsoft.com > >> Cc: David Turner ; ava...@gmail.com; > >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; > >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > >> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize > >> a file system monitor to speed up detecting new or changed files. > > > >> +int git_config_get_fsmonitor(void) > >> +{ > >> + if (git_config_get_pathname("core.fsmonitor", _fsmonitor)) > >> + core_fsmonitor = getenv("GIT_FSMONITOR_TEST"); > >> + > >> + if (core_fsmonitor && !*core_fsmonitor) > >> + core_fsmonitor = NULL; > >> + > >> + if (core_fsmonitor) > >> + return 1; > >> + > >> + return 0; > >> +} > > > > This functions return values are backwards relative to the rest of the > git_config_* functions. > > I'm confused. If core.fsmonitor is configured, it returns 1. If it is not > configured, it returns 0. I don't make use of the -1 /* default value */ > option > as I didn't see any use/value in this case. What is backwards? The other git_config_* functions return 1 for error and 0 for success. > > [snip] > > > > +> /* > > +> * With fsmonitor, we can trust the untracked cache's valid field. > > +> */ > > > > Did you intend to make a comment here? Sorry. I was going to make a comment that I didn't see how that could work since we weren't touching the untracked cache here, but then I saw the bit further down. I'm still not sure it works (see comment on 10/12), but at least it could in theory work.
Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.
On 9/17/2017 4:03 AM, Junio C Hamano wrote: Ben Peartwrites: +[[fsmonitor-watchman]] +fsmonitor-watchman +~~~ I've queued a mini squash on top to make sure the line aligns with the length of the string above it by adding three ~'s here. Thanks, I'll do the same assuming there will be another re-roll.
Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.
On 9/15/2017 3:43 PM, David Turner wrote: -Original Message- From: Ben Peart [mailto:benpe...@microsoft.com] Sent: Friday, September 15, 2017 3:21 PM To: benpe...@microsoft.com Cc: David Turner; ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension. This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension. Signed-off-by: Ben Peart --- Documentation/config.txt | 6 ++ Documentation/githooks.txt | 23 +++ Documentation/technical/index-format.txt | 19 +++ 3 files changed, 48 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index dc4e3f58a2..c196007a27 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -413,6 +413,12 @@ core.protectNTFS:: 8.3 "short" names. Defaults to `true` on Windows, and `false` elsewhere. +core.fsmonitor:: + If set, the value of this variable is used as a command which + will identify all files that may have changed since the + requested date/time. This information is used to speed up git by + avoiding unnecessary processing of files that have not changed. I'm confused here. You have a file called "fsmonitor-watchman", which seems to discuss the protocol for core.fsmonitor scripts in general, and you have this documentation, which does not link to that file. Can you clarify this? I'll add the missing link to the documentation in githooks.txt. The documentation should be enough for someone to develop another integration script. The fsmonitor-watchman script allows people to easily use this patch series with the existing Watchman monitor but it can certainly also be used as a sample for how to integrate with another file system monitor. +The hook should output to stdout the list of all files in the working +directory that may have changed since the requested time. The logic +should be inclusive so that it does not miss any potential changes. +"It is OK to include files which have not actually changed. Newly-created and deleted files should also be included. When files are renamed, both the old and the new name should be included." Also, please discuss case sensitivity issues (e.g. on OS X). +The paths should be relative to the root of the working directory and +be separated by a single NUL. + - 32-bit version number: the current supported version is 1. + + - 64-bit time: the extension data reflects all changes through the given + time which is stored as the nanoseconds elapsed since midnight, + January 1, 1970. Nit: Please specify signed or unsigned for these. (I expect to be getting out of cryosleep around 2262, and I want to know if my old git repos will keep working...) While I'm not opposed to specifying unsigned, I did notice that the only place signed/unsigned is specified today is in "index entry." Everywhere else doesn't specify so I left it off for consistency. I've not seen negative version numbers nor negative time so am not entirely sure it is necessary to clarify. :) + - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. + + - An ewah bitmap, the n-th bit indicates whether the n-th index entry +is not CE_FSMONITOR_VALID.
Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
Thanks for taking the time to review/provide feedback! On 9/15/2017 5:35 PM, David Turner wrote: -Original Message- From: Ben Peart [mailto:benpe...@microsoft.com] Sent: Friday, September 15, 2017 3:21 PM To: benpe...@microsoft.com Cc: David Turner; ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. +int git_config_get_fsmonitor(void) +{ + if (git_config_get_pathname("core.fsmonitor", _fsmonitor)) + core_fsmonitor = getenv("GIT_FSMONITOR_TEST"); + + if (core_fsmonitor && !*core_fsmonitor) + core_fsmonitor = NULL; + + if (core_fsmonitor) + return 1; + + return 0; +} This functions return values are backwards relative to the rest of the git_config_* functions. I'm confused. If core.fsmonitor is configured, it returns 1. If it is not configured, it returns 0. I don't make use of the -1 /* default value */ option as I didn't see any use/value in this case. What is backwards? [snip] +> /* +>* With fsmonitor, we can trust the untracked cache's valid field. +>*/ Did you intend to make a comment here? [snip] +int read_fsmonitor_extension(struct index_state *istate, const void *data, + unsigned long sz) +{ If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak. Good catch! Thank you. [snip] + /* a fsmonitor process can return '*' to indicate all entries are invalid */ That's not documented in your documentation. Also, I'm not sure I like it: what if I have a file whose name starts with '*'? Yeah, that would be silly, but this indicates the need for the protocol to have some sort of signaling mechanism that's out-of-band Maybe have some key\0value\0 pairs and then \0\0 and then the list of files? Or, if you want to keep it really simple, allow an entry of '/' (which is an invalid filename) to mean 'all'. Yea, this was an optimization I added late in the game to get around an issue in Watchman where it returns the name of every file the first time you query it (rather than the set of files that have actually changed since the requested time). I didn't realize the wild card '*' was a valid character for a filename. I'll switch to '/' as you suggest as I don't want to complicate things unnecessarily to handle this relatively rare optimization. I'll also get it documented properly. Thanks! +void add_fsmonitor(struct index_state *istate) { + int i; + + if (!istate->fsmonitor_last_update) { [snip] + /* reset the untracked cache */ Is this really necessary? Shouldn't the untracked cache be in a correct state already? When fsmonitor is not turned on, I'm not explicitly invalidating untracked cache directory entries as git makes changes to files. While I doubt the sequence happens of 1) git making changes to files, *then* 2) turning on fsmonitor - I thought it better safe than sorry to assume that pattern won't ever happen in the future. Especially since turning on the extension is rare and the cost is low. +/* + * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate Nit: "s/entries/entry's/".
Re: [PATCH] test-drop-caches: mark file local symbols with static
On 9/17/2017 12:14 PM, Ramsay Jones wrote: Signed-off-by: Ramsay Jones--- Hi Ben, If you need to re-roll your 'bp/fsmonitor' branch, could you please squash this into the relevant patch (commit c6b5a28941, "fsmonitor: add a performance test", 15-09-2017). Absolutely. Thanks! I'd really appreciate some feedback on whether this works properly on platform other than Windows. Thanks! ATB, Ramsay Jones t/helper/test-drop-caches.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c index 717079865..17590170f 100644 --- a/t/helper/test-drop-caches.c +++ b/t/helper/test-drop-caches.c @@ -2,7 +2,7 @@ #if defined(GIT_WINDOWS_NATIVE) -int cmd_sync(void) +static int cmd_sync(void) { char Buffer[MAX_PATH]; DWORD dwRet; @@ -49,7 +49,7 @@ typedef enum _SYSTEM_MEMORY_LIST_COMMAND { MemoryCommandMax } SYSTEM_MEMORY_LIST_COMMAND; -BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags) +static BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags) { BOOL bResult; DWORD dwBufferLength; @@ -77,7 +77,7 @@ BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags) return bResult; } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { HANDLE hProcess = GetCurrentProcess(); HANDLE hToken; @@ -118,36 +118,36 @@ int cmd_dropcaches(void) #elif defined(__linux__) -int cmd_sync(void) +static int cmd_sync(void) { return system("sync"); } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { return system("echo 3 | sudo tee /proc/sys/vm/drop_caches"); } #elif defined(__APPLE__) -int cmd_sync(void) +static int cmd_sync(void) { return system("sync"); } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { return system("sudo purge"); } #else -int cmd_sync(void) +static int cmd_sync(void) { return 0; } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { return error("drop caches not implemented on this platform"); }
Re: RFC v3: Another proposed hash function transition plan
Hi Johannes, > SHA-256 got much more cryptanalysis than SHA3-256 […]. I do not think this is true. Keccak/SHA-3 actually got (and is still getting) a lot of cryptanalysis, with papers published at renowned crypto conferences [1]. Keccak/SHA-3 is recognized to have a significant safety margin. E.g., one can cut the number of rounds in half (as in Keyak or KangarooTwelve) and still get a very strong function. I don't think we could say the same for SHA-256 or SHA-512… Kind regards, Gilles, for the Keccak team [1] https://keccak.team/third_party.html
Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()
On 9/17/2017 5:51 PM, Junio C Hamano wrote: Derrick Stoleewrites: +int cmd_main(int ac, const char **av) +{ + setup_git_directory(); As far as I recall, we do not (yet) allow declaration after statement in our codebase. Move this down to make it after all decls. Will fix. + + unsigned int hash_delt = 0x13579BDF; + unsigned int hash_base = 0x01020304; + struct object_id oid; + + int i, count = 0; + int n = sizeof(struct object_id) / sizeof(int); It probably is technically OK to assume sizeof(int) always equals to sizeof(unsigned), but because you use 'n' _only_ to work with uint and never with int, it would make more sense to match. Will fix. I also notice that "n" should be const. But I do not think we want this "clever" optimization that involves 'n' in the first place. >>> + while (count++ < 10) { + for (i = 0; i < n; i++) + ((unsigned int*)oid.hash)[i] = hash_base; Does it make sense to assume that uint is always 4-byte (so this code won't work if it is 8-byte on your platform) and doing this is faster than using platform-optimized memcpy()? I'm not sure what you mean by using memcpy to improve this, because it would require calling memcpy in the inner loop, such as for (i = 0; i < n; i++) memcpy(oid.hash + i * sizeof(unsigned), _base, sizeof(unsigned)); I'm probably misunderstanding your intended use of memcpy(). + find_unique_abbrev(oid.hash, MINIMUM_ABBREV); + + hash_base += hash_delt; + } I also wonder if this is measuring the right thing. I am guessing that by making many queries for a unique abbreviation of "random" (well, it is deterministic, but my point is these queries are not based on the names of objects that exist in the repository) hashes, this test measures how much time it takes for us to decide that such a random hash does not exist. In the real life use, we make the opposite query far more frequently: we have an object that we _know_ exists in the repository and we try to find a sufficient length to disambiguate it from others, and I suspect that these two use different logic. Don't you need to be measuring the time it takes to compute the shortest abbreviation of an object that exists instead? First, this doesn't just measure the time it takes to determine non- existence, because it finds the len required to disambiguate an "incoming" hash from all known hashes. When testing, I put in a simple printf to report the result abbreviation so I could see how often it needed to be extended. In this sense, the test exposes the while loop that is removed by PATCH 2/3. Second, your criticism about extant hashes is valid, and one I struggled to reconcile. I see two issues with testing known hashes: 1. By determining the hash exists, we have inspected the file that contains it (either a .idx or the loose-object directory). This has side-effects that warm up the file cache so the looped method is artificially faster to find matching objects. The effect is particularly significant on a repo with exactly one packfile. 2. If we iterate over the entire set of objects, this test takes O(N*t(N)) time, where t(N) is the average time to compute a minimum abbreviation. For large repos, this can take several minutes. Instead, even with the constant 100,000 test hashes, we have an O(t(N)) test. We could avoid the asymptotic growth by limiting the number of existing hashes we use, but how do we find a sufficiently uniform sample from them? By looking at some other perf tests, I see that we can add a pre- requisite action. I will investigate making another helper that uniformly selects a set of hashes from the repo and writes them to stdout in a random order. p0008-abbrev.sh will run the helper as a prerequisite, saving the data to a file. test-abbrev will read the hashes from stdin to test find_unique_abbrev. This should avoid the side-effects I mentioned (test-abbrev will not warm up indexes) while also testing abbreviation lengths for existing objects. I'll get started on these changes and send a new patch with new perf data in a couple days. Please let me know if there is a better way to measure performance for this change. Thanks, -Stolee
BUSINESS PROPOSAL
Please I like you to keep this proposal as a top secret and delete it if you are not interested and get back to me if you are interested for details as regards to the transfer of $24,500,000 to you. This money initially belongs to a Libyan client who died in the libya crisis and had no next of kin in his account-opening package in my bank here in Hong kong where I am a bank director. In other to achieve this, I shall require your full name, and telephone number to reach you.Most importantly, a confirmation of acceptance from you will be needed after which I shall furnish you with the full details of this transaction. Kindly reply to linglung0...@gmail.com Respectfully, Ling Lung