Re: [PATCH 0/4]
Stefan Beller writes: > A couple days before the 2.19 release we had a bug report about > broken submodules[1] and reverted[2] the commits leading up to them. > > The behavior of said bug fixed itself by taking a different approach[3], > specifically by a weaker enforcement of having `core.worktree` set in a > submodule [4]. > > The revert [2] was overly broad as we neared the release, such that we wanted > to rather keep the known buggy behavior of always having `core.worktree` set, > rather than figuring out how to fix the new bug of having 'git submodule > update' > not working in old style repository setups. > > This series re-introduces those reverted patches, with no changes in code, > but with drastically changed commit messages, as those focus on why it is safe > to re-introduce them instead of explaining the desire for the change. The above was a bit too cryptic for me to grok, so let me try rephrasing to see if I got them all correctly. - three-patch series leading to 984cd77ddb were meant to fix some bug, but the series itself was buggy and caused problems; we got rid of them - the problem 984cd77ddb wanted to fix was fixed differently without reintroducing the problem three-patch series introduced. That fix is already with us since 4d6d6ef1fc. - now these three changes that were problematic in the past is resent without any update (other than that it has one preparatory patch to add tests). Is that what is going on? Obviously I am not getting "the other" benefit we wanted to gain out of these three patches (because the above description fails to explain what that is), other than to fix the issue that was fixed by 4d6d6ef1fc. Sorry for being puzzled... > [1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight > [2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", > 2018-09-07) > [3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17) > [4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by > ensure-core-worktree, 2018-08-13) > > Stefan Beller (4): > submodule update: add regression test with old style setups > submodule: unset core.worktree if no working tree is present > submodule--helper: fix BUG message in ensure_core_worktree > submodule deinit: unset core.worktree > > builtin/submodule--helper.c| 4 +++- > submodule.c| 14 ++ > submodule.h| 2 ++ > t/lib-submodule-update.sh | 5 +++-- > t/t7400-submodule-basic.sh | 5 + > t/t7412-submodule-absorbgitdirs.sh | 7 ++- > 6 files changed, 33 insertions(+), 4 deletions(-)
[PATCH 0/4]
A couple days before the 2.19 release we had a bug report about broken submodules[1] and reverted[2] the commits leading up to them. The behavior of said bug fixed itself by taking a different approach[3], specifically by a weaker enforcement of having `core.worktree` set in a submodule [4]. The revert [2] was overly broad as we neared the release, such that we wanted to rather keep the known buggy behavior of always having `core.worktree` set, rather than figuring out how to fix the new bug of having 'git submodule update' not working in old style repository setups. This series re-introduces those reverted patches, with no changes in code, but with drastically changed commit messages, as those focus on why it is safe to re-introduce them instead of explaining the desire for the change. [1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight [2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07) [3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17) [4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13) Stefan Beller (4): submodule update: add regression test with old style setups submodule: unset core.worktree if no working tree is present submodule--helper: fix BUG message in ensure_core_worktree submodule deinit: unset core.worktree builtin/submodule--helper.c| 4 +++- submodule.c| 14 ++ submodule.h| 2 ++ t/lib-submodule-update.sh | 5 +++-- t/t7400-submodule-basic.sh | 5 + t/t7412-submodule-absorbgitdirs.sh | 7 ++- 6 files changed, 33 insertions(+), 4 deletions(-) -- 2.20.0.rc2.403.gdbc3b29805-goog
Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering
Hi Junio, On Wed, 31 Oct 2018, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" > writes: > > > An alternative approach which was rejected at the time (because it > > interfered with the then-ongoing work to compile Git for Windows using > > MS Visual C++) would patch the make_environment_block() function to > > skip the specified environment variables before handing down the > > environment block to the spawned process. Currently it would interfere > > with the mingw-utf-8-env patch series I sent earlier today > > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com]. > > So the rejected approach that was not used in this series would > interfere with the other one, but I do not have to worry about it > because this series does not use that approach? I had to read the six > lines above twice to figure out that it essentially is saying "I > shouldn't care", but please let me know if I misread the paragraph and I > need to care ;-) Well, you might want to worry about it. Or not. The approach taken by this patch series is to call `unsetenv()` for the variable names listed in `core.unsetenvvars` (if any), just before spawning off the first process (if any). What I meant by this comment in the cover letter is that I thought about doing it differently. We already have a perfectly fine function called `make_environment_block()` that takes a "deltaenv", and then constructs a new environment block from the current environment plus deltaenv. And this would be an obvious alternative place to "unset" the variables, as this function is only called just before spawning new processes. I was weighing both options, and both back then as right now, there are other patches in flight that conflict with the second approach, so the first approach is what I went with. >From a purely aesthetic point of view, the second approach looks nicer, as it really only affects the spawned processes (and not the current one), and it allows for changing the list between spawning processes. But to do it right, I would also have to use a hash set, and it would complicate the code of `make_environment_block()` even further. And it sounds a bit like overkill to me, too. So I sided with the approach presented in the current revision of the patch series, but I wanted to describe the other approach in case you (or other reviewers) have a different opinion. > > While at it, this patch series also cleans up house and moves the > > Windows-specific core.* variable handling to compat/mingw.c rather > > than cluttering environment.c and config.c with things that e.g. > > developers on Linux do not want to care about. > > Or Macs. Or macOS. Or FreeBSD. Or Irix. Or any other, that's right ;-) Traditionally, we did not really care all that much about platforms other than Linux, though, which is what made me write what I wrote. Having said that, I get the impression that this is changing slowly. The benefits are pretty clear, too. After all, it was a *Windows* build failure recently that let Peff identify and fix a *non-Windows* bug... > When I skimmed this series earlier, I found that patches 2 & 3 sensibly > implemented to achieve this goal. Thanks! Dscho > > > > > Johannes Schindelin (4): > > config: rename `dummy` parameter to `cb` in git_default_config() > > Allow for platform-specific core.* config settings > > Move Windows-specific config settings into compat/mingw.c > > mingw: unset PERL5LIB by default > > > > Documentation/config.txt | 6 > > cache.h | 8 - > > compat/mingw.c | 58 +++- > > compat/mingw.h | 3 ++ > > config.c | 18 --- > > environment.c| 1 - > > git-compat-util.h| 8 + > > t/t0029-core-unsetenvvars.sh | 30 +++ > > 8 files changed, 109 insertions(+), 23 deletions(-) > > create mode 100755 t/t0029-core-unsetenvvars.sh > > > > > > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 > > Published-As: > > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1 > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git > > pr-62/dscho/perl5lib-v1 > > Pull-Request: https://github.com/gitgitgadget/git/pull/62 >
Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering
"Johannes Schindelin via GitGitGadget" writes: > An alternative approach which was rejected at the time (because it > interfered with the then-ongoing work to compile Git for Windows using MS > Visual C++) would patch the make_environment_block() function to skip the > specified environment variables before handing down the environment block to > the spawned process. Currently it would interfere with the mingw-utf-8-env > patch series I sent earlier today > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com]. So the rejected approach that was not used in this series would interfere with the other one, but I do not have to worry about it because this series does not use that approach? I had to read the six lines above twice to figure out that it essentially is saying "I shouldn't care", but please let me know if I misread the paragraph and I need to care ;-) > While at it, this patch series also cleans up house and moves the > Windows-specific core.* variable handling to compat/mingw.c rather than > cluttering environment.c and config.c with things that e.g. developers on > Linux do not want to care about. Or Macs. When I skimmed this series earlier, I found that patches 2 & 3 sensibly implemented to achieve this goal. > > Johannes Schindelin (4): > config: rename `dummy` parameter to `cb` in git_default_config() > Allow for platform-specific core.* config settings > Move Windows-specific config settings into compat/mingw.c > mingw: unset PERL5LIB by default > > Documentation/config.txt | 6 > cache.h | 8 - > compat/mingw.c | 58 +++- > compat/mingw.h | 3 ++ > config.c | 18 --- > environment.c| 1 - > git-compat-util.h| 8 + > t/t0029-core-unsetenvvars.sh | 30 +++ > 8 files changed, 109 insertions(+), 23 deletions(-) > create mode 100755 t/t0029-core-unsetenvvars.sh > > > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 > Published-As: > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git > pr-62/dscho/perl5lib-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/62
[PATCH 0/4] mingw: prevent external PERL5LIB from interfering
In Git for Windows, we do not support running the Perl scripts with any random Perl interpreter. Instead, we insist on using the shipped one (except for MinGit, where we do not even ship the Perl scripts, to save on space). However, if Git is called from, say, a Perl script running in a different Perl interpreter with appropriately configured PERL5LIB, it messes with Git's ability to run its Perl scripts. For that reason, we devised the presented method of defining a list of environment variables (via core.unsetEnvVars) that would then be unset before spawning any process, defaulting to PERL5LIB. An alternative approach which was rejected at the time (because it interfered with the then-ongoing work to compile Git for Windows using MS Visual C++) would patch the make_environment_block() function to skip the specified environment variables before handing down the environment block to the spawned process. Currently it would interfere with the mingw-utf-8-env patch series I sent earlier today [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com]. While at it, this patch series also cleans up house and moves the Windows-specific core.* variable handling to compat/mingw.c rather than cluttering environment.c and config.c with things that e.g. developers on Linux do not want to care about. Johannes Schindelin (4): config: rename `dummy` parameter to `cb` in git_default_config() Allow for platform-specific core.* config settings Move Windows-specific config settings into compat/mingw.c mingw: unset PERL5LIB by default Documentation/config.txt | 6 cache.h | 8 - compat/mingw.c | 58 +++- compat/mingw.h | 3 ++ config.c | 18 --- environment.c| 1 - git-compat-util.h| 8 + t/t0029-core-unsetenvvars.sh | 30 +++ 8 files changed, 109 insertions(+), 23 deletions(-) create mode 100755 t/t0029-core-unsetenvvars.sh base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-62/dscho/perl5lib-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/62 -- gitgitgadget
Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking
Geert Jansen writes: > Maybe a better option would be to check for the non-existence of the [00-ff] > directories under .git/objects. Please do not do this; I expect many people do this before they leave work, just like I do: $ git repack -a -d -f --window=$largs --depth=$small $ git prune which would typically leave only info/ and pack/ subdirectories under .git/objects/ directory.
Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking
On Sun, Oct 28, 2018 at 10:50:19PM +, Ævar Arnfjörð Bjarmason wrote: > I left the door open for that in the new config option 4/4 implements, > but I suspect for Geert's purposes this is something he'd prefer to > turn off in git on clone entirely, i.e. because it may be running on > some random Amazon's customer's EFS instance, and they won't know > about this new core.checkCollisions option. > > But maybe I'm wrong about that and Geert is happy to just turn on > core.checkCollisions=false and use this series instead. I think that the best user experience would probably be if git were fast by default without having to give up on (future) security by removing the sha1 collision check. Maybe core.checkCollisons could default to "on" only when there's no loose objects in the repository? That would give a fast experience for many common cases (git clone, git init && git fetch) while still doing the collision check when relevant. My patch used the --cloning flag as an approximation of "no loose objects". Maybe a better option would be to check for the non-existence of the [00-ff] directories under .git/objects.
[PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking
This patch series implements what I suggested in https://public-inbox.org/git/87lg6jljmf@evledraar.gmail.com/ It's not a replacement for what Geert Jansen's RFC in https://public-inbox.org/git/ed25e182-c296-4d08-8170-340567d89...@amazon.com/ does, which was to turn this off entirely on "clone". I left the door open for that in the new config option 4/4 implements, but I suspect for Geert's purposes this is something he'd prefer to turn off in git on clone entirely, i.e. because it may be running on some random Amazon's customer's EFS instance, and they won't know about this new core.checkCollisions option. But maybe I'm wrong about that and Geert is happy to just turn on core.checkCollisions=false and use this series instead. Ævar Arnfjörð Bjarmason (4): pack-objects test: modernize style pack-objects tests: don't leave test .git corrupt at end index-pack tests: don't leave test repo dirty at end index-pack: add ability to disable SHA-1 collision check Documentation/config.txt | 68 builtin/index-pack.c | 7 ++-- cache.h | 1 + config.c | 20 +++ config.h | 1 + environment.c| 1 + t/README | 3 ++ t/t1060-object-corruption.sh | 37 +++- t/t5300-pack-object.sh | 51 +++ 9 files changed, 163 insertions(+), 26 deletions(-) -- 2.19.1.759.g500967bb5e
Re: [PATCH 0/4] Bloom filter experiment
Ævar Arnfjörð Bjarmason writes: >> This is all to say: having a maximum size is good. 512 is big enough >> to cover _most_ commits, but not so big that we may store _really_ big >> filters. > > Makes sense. 512 is good enough to hardcode initially, but I couldn't > tell from briefly skimming the patches if it was possible to make this > size dynamic per-repo when the graph/filter is written. > > I.e. we might later add some discovery step where we look at N number of > commits at random, until we're satisfied that we've come up with some > average/median number of total (recursive) tree entries & how many tend > to be changed per-commit. > > I.e. I can imagine repositories (with some automated changes) where we > have 10k files and tend to change 1k per commit, or ones with 10k files > where we tend to change just 1-10 per commit, which would mean a > larger/smaller filter would be needed / would do. I was more interested to find out what the advice our docs should give to the end users to tune the value once such a knob is invented, and what you gave in the above paragraphs may lead us to a nice auto-tuning heuristics.
Re: [PATCH 0/4] Bloom filter experiment
> | Implementation | Queries | Maybe | FP # | FP % | > ||-|---|--|---| > | Szeder | 66095 | 1142 | 256 | 0.38% | > | Jonathan | 66459 | 107 | 89 | 0.16% | > | Stolee | 53025 | 492 | 479 | 0.90% | > > (Note that we must have used different starting points, which is why my > "Queries" is so much smaller.) I suspect it's because your bloom filter implementation covers only the first parent (if I'm understanding get_bloom_filter() correctly). When I only covered the first parent in my initial test (see patch 2 of [1]), I got (following the columns in the table above): 53096 107 89 0.001676 Also, I think that the rejecting power (Queries - Maybe)/(Total tree comparisons if no bloom filters were used) needs to be in the evaluation criteria somewhere, as that indicates how many tree comparisons we managed to avoid. Also, we probably should also test on a file that changes more frequently :-) [1] https://public-inbox.org/git/cover.1539219248.git.jonathanta...@google.com/ > The increase in false-positive percentage is expected in my > implementation. I'm using the correct filter sizes to hit a <1% FP > ratio. This could be lowered by changing the settings, and the size > would dynamically grow. For my Git repo (which contains > git-for-windows/git and microsoft/git) this implementation grows the > commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to > Szeder's 8MB filter). For 105,260 commits, that rounds out to less than > 20 bytes per commit (compared to Jonathan's 256 bytes per commit). Mine has 256 bits per commit, which is 32 bytes per commit (still more than yours). Having said all that, thanks for writing up your version - in particular, variable sized filters (like in yours) seem to be the way to go. > We'll see how much time I have to do this polish, but I think the > benefit is proven. Agreed.
Re: [PATCH 0/4] Bloom filter experiment
On 10/16/2018 8:57 AM, Ævar Arnfjörð Bjarmason wrote: On Tue, Oct 16 2018, Derrick Stolee wrote: On 10/16/2018 12:45 AM, Junio C Hamano wrote: Derrick Stolee writes: 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. ... 6. When we compute the Bloom filters, we don't store a filter for commits whose first-parent diff has more than 512 paths. Just being curious but was 512 taken out of thin air or is there some math behind it, e.g. to limit false positive rate down to certain threshold? With a wide-enough bitset, you could store arbitrary large number of paths with low enough false positive, I guess, but is there a point where there is too many paths in the change that gives us diminishing returns and not worth having a filter in the first place? 512 is somewhat arbitrary, but having a maximum size is not. In a normal source-code-control context, the set of paths modified by any single commit ought to be a small subset of the entire paths, and whole-tree changes ought to be fairly rare. In a project for which that assumption does not hold, it might help to have a negative bloom filter (i.e. "git log -- A" asks "does the commit modify A?" and the filter would say "we know it does not, because we threw all the paths that are not touched to the bloom filter"), but I think that would optimize for a wrong case. A commit with many changed paths is very rare. The 512 I picked above is enough to cover 99% of commits in each of the repos I sampled when first investigating Bloom filters. When a Bloom filter response says "maybe yes" (in our case, "maybe not TREESAME"), then we need to verify that it is correct. In the extreme case that every path is changed, then the Bloom filter does nothing but add extra work. These extreme cases are also not unprecedented: in our Azure Repos codebase, we were using core.autocrlf to smudge CRLFs to LFs, but when it was time to dogfood VFS for Git, we needed to turn off the smudge filter. So, there is one commit that converts every LF to a CRLF in every text file. Storing a Bloom filter for those ~250,000 entries would take ~256KB for essentially no value. By not storing a filter for this commit, we go immediately to the regular TREESAME check, which would happen for most pathspecs. This is all to say: having a maximum size is good. 512 is big enough to cover _most_ commits, but not so big that we may store _really_ big filters. Makes sense. 512 is good enough to hardcode initially, but I couldn't tell from briefly skimming the patches if it was possible to make this size dynamic per-repo when the graph/filter is written. My proof-of-concept has it as a constant, but part of my plan is to make these all config options, as of this item in my message: >>> 2. We need config values for writing and consuming bloom filters, but also to override the default settings. I.e. we might later add some discovery step where we look at N number of commits at random, until we're satisfied that we've come up with some average/median number of total (recursive) tree entries & how many tend to be changed per-commit. I.e. I can imagine repositories (with some automated changes) where we have 10k files and tend to change 1k per commit, or ones with 10k files where we tend to change just 1-10 per commit, which would mean a larger/smaller filter would be needed / would do. I'm not sure a dynamic approach would be worth the effort, but I'm open to hearing the results of an experiment. Thanks, -Stolee
Re: [PATCH 0/4] Bloom filter experiment
On Tue, Oct 16 2018, Derrick Stolee wrote: > On 10/16/2018 12:45 AM, Junio C Hamano wrote: >> Derrick Stolee writes: >> >>> 2. The filters are sized according to the number of changes in each >>> commit, with a minimum of one 64-bit word. >>> ... >>> 6. When we compute the Bloom filters, we don't store a filter for >>> commits whose first-parent diff has more than 512 paths. >> Just being curious but was 512 taken out of thin air or is there >> some math behind it, e.g. to limit false positive rate down to >> certain threshold? With a wide-enough bitset, you could store >> arbitrary large number of paths with low enough false positive, I >> guess, but is there a point where there is too many paths in the >> change that gives us diminishing returns and not worth having a >> filter in the first place? > 512 is somewhat arbitrary, but having a maximum size is not. >> In a normal source-code-control context, the set of paths modified >> by any single commit ought to be a small subset of the entire paths, >> and whole-tree changes ought to be fairly rare. In a project for >> which that assumption does not hold, it might help to have a >> negative bloom filter (i.e. "git log -- A" asks "does the commit >> modify A?" and the filter would say "we know it does not, because we >> threw all the paths that are not touched to the bloom filter"), but >> I think that would optimize for a wrong case. > > A commit with many changed paths is very rare. The 512 I picked above > is enough to cover 99% of commits in each of the repos I sampled when > first investigating Bloom filters. > > When a Bloom filter response says "maybe yes" (in our case, "maybe not > TREESAME"), then we need to verify that it is correct. In the extreme > case that every path is changed, then the Bloom filter does nothing > but add extra work. > > These extreme cases are also not unprecedented: in our Azure Repos > codebase, we were using core.autocrlf to smudge CRLFs to LFs, but > when it was time to dogfood VFS for Git, we needed to turn off the > smudge filter. So, there is one commit that converts every LF to a > CRLF in every text file. Storing a Bloom filter for those ~250,000 > entries would take ~256KB for essentially no value. By not storing a > filter for this commit, we go immediately to the regular TREESAME > check, which would happen for most pathspecs. > > This is all to say: having a maximum size is good. 512 is big enough > to cover _most_ commits, but not so big that we may store _really_ big > filters. Makes sense. 512 is good enough to hardcode initially, but I couldn't tell from briefly skimming the patches if it was possible to make this size dynamic per-repo when the graph/filter is written. I.e. we might later add some discovery step where we look at N number of commits at random, until we're satisfied that we've come up with some average/median number of total (recursive) tree entries & how many tend to be changed per-commit. I.e. I can imagine repositories (with some automated changes) where we have 10k files and tend to change 1k per commit, or ones with 10k files where we tend to change just 1-10 per commit, which would mean a larger/smaller filter would be needed / would do.
Re: [PATCH 0/4] Bloom filter experiment
On 10/16/2018 12:45 AM, Junio C Hamano wrote: Derrick Stolee writes: 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. ... 6. When we compute the Bloom filters, we don't store a filter for commits whose first-parent diff has more than 512 paths. Just being curious but was 512 taken out of thin air or is there some math behind it, e.g. to limit false positive rate down to certain threshold? With a wide-enough bitset, you could store arbitrary large number of paths with low enough false positive, I guess, but is there a point where there is too many paths in the change that gives us diminishing returns and not worth having a filter in the first place? 512 is somewhat arbitrary, but having a maximum size is not. In a normal source-code-control context, the set of paths modified by any single commit ought to be a small subset of the entire paths, and whole-tree changes ought to be fairly rare. In a project for which that assumption does not hold, it might help to have a negative bloom filter (i.e. "git log -- A" asks "does the commit modify A?" and the filter would say "we know it does not, because we threw all the paths that are not touched to the bloom filter"), but I think that would optimize for a wrong case. A commit with many changed paths is very rare. The 512 I picked above is enough to cover 99% of commits in each of the repos I sampled when first investigating Bloom filters. When a Bloom filter response says "maybe yes" (in our case, "maybe not TREESAME"), then we need to verify that it is correct. In the extreme case that every path is changed, then the Bloom filter does nothing but add extra work. These extreme cases are also not unprecedented: in our Azure Repos codebase, we were using core.autocrlf to smudge CRLFs to LFs, but when it was time to dogfood VFS for Git, we needed to turn off the smudge filter. So, there is one commit that converts every LF to a CRLF in every text file. Storing a Bloom filter for those ~250,000 entries would take ~256KB for essentially no value. By not storing a filter for this commit, we go immediately to the regular TREESAME check, which would happen for most pathspecs. This is all to say: having a maximum size is good. 512 is big enough to cover _most_ commits, but not so big that we may store _really_ big filters. Thanks, -Stolee
Re: [PATCH 0/4] Bloom filter experiment
Derrick Stolee writes: > 2. The filters are sized according to the number of changes in each > commit, with a minimum of one 64-bit word. > ... > 6. When we compute the Bloom filters, we don't store a filter for > commits whose first-parent diff has more than 512 paths. Just being curious but was 512 taken out of thin air or is there some math behind it, e.g. to limit false positive rate down to certain threshold? With a wide-enough bitset, you could store arbitrary large number of paths with low enough false positive, I guess, but is there a point where there is too many paths in the change that gives us diminishing returns and not worth having a filter in the first place? In a normal source-code-control context, the set of paths modified by any single commit ought to be a small subset of the entire paths, and whole-tree changes ought to be fairly rare. In a project for which that assumption does not hold, it might help to have a negative bloom filter (i.e. "git log -- A" asks "does the commit modify A?" and the filter would say "we know it does not, because we threw all the paths that are not touched to the bloom filter"), but I think that would optimize for a wrong case.
Re: [PATCH 0/4] Bloom filter experiment
On 10/9/2018 3:34 PM, SZEDER Gábor wrote: To keep the ball rolling, here is my proof of concept in a somewhat cleaned-up form, with still plenty of rough edges. Peff, Szeder, and Jonathan, Thanks for giving me the kick in the pants to finally write a proof of concept for my personal take on how this should work. My implementation borrows things from both Szeder and Jonathan's series. You can find my commits for all of the versions on GitHub (it's a bit too messy to share as a patch series right now, I think): Repo: https://github.com/derrickstolee/git Branches: bloom/* (includes bloom/stolee, bloom/peff, bloom/szeder, and bloom/tan for the respective implementations, and bloom/base as the common ancestor) My implementation uses the following scheme: 1. Bloom filters are computed and stored on a commit-by-commit basis. 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. 3. The filters are stored in the commit-graph using two new optional chunks: one stores a single 32-bit integer for each commit that provides the end of its Bloom filter in the second "data" chunk. The data chunk also stores the magic constants (hash version, num hash keys, and num bits per entry). 4. We fill the Bloom filters as (const char *data, int len) pairs as "struct bloom_filter"s in a commit slab. 5. In order to evaluate containment, we need the struct bloom_filter, but also struct bloom_settings (stores the magic constants in one place), and struct bloom_key (stores the _k_ hash values). This allows us to hash a path once and test the same path against many Bloom filters. 6. When we compute the Bloom filters, we don't store a filter for commits whose first-parent diff has more than 512 paths. 7. When we compute the commit-graph, we can re-use the pre-existing filters without needing to recompute diffs. (Caveat: the current implementation will re-compute diffs for the commits with diffs that were too large.) You can build the Bloom filters in my implementation this way: GIT_TEST_BLOOM_FILTERS=1 ./git commit-graph write --reachable You can play around with it like this: $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write Computing commit graph generation numbers: 100% (52801/52801), done. Computing bloom filter: 100% (52801/52801), done. # Yeah, I even added progress indicator! :) $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count --full-history HEAD -- t/valgrind/valgrind.sh 886 20:40:24.783699 revision.c:486 bloom filter total queries: 66095 definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873 Jonathan used this same test, so will I. Here is a summary table: | Implementation | Queries | Maybe | FP # | FP % | ||-|---|--|---| | Szeder | 66095 | 1142 | 256 | 0.38% | | Jonathan | 66459 | 107 | 89 | 0.16% | | Stolee | 53025 | 492 | 479 | 0.90% | (Note that we must have used different starting points, which is why my "Queries" is so much smaller.) The increase in false-positive percentage is expected in my implementation. I'm using the correct filter sizes to hit a <1% FP ratio. This could be lowered by changing the settings, and the size would dynamically grow. For my Git repo (which contains git-for-windows/git and microsoft/git) this implementation grows the commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to Szeder's 8MB filter). For 105,260 commits, that rounds out to less than 20 bytes per commit (compared to Jonathan's 256 bytes per commit). Related stats for my Linux repo: 781,756 commits, commit-graph grows from 43.8 to 55.6 MB (~12 MB additional, ~16 bytes per commit). I haven't done a side-by-side performance test for these implementations, but it would be interesting to do so. Despite writing a lot of code in a short amount of time, there is a lot of work to be done before this is submittable: 1. There are three different environment variables right now. It would be better to have one GIT_TEST_ variable and rely on existing tracing for logs (trace2 values would be good here). 2. We need config values for writing and consuming bloom filters, but also to override the default settings. 3. My bloom.c/bloom.h is too coupled to the commit-graph. I want to harden that interface to let Bloom filters live as their own thing, but that the commit-graph could load a bloom filter from the file instead of from the slab. 4. Tests, tests, and more tests. We'll see how much time I have to do this polish, but I think the benefit is proven. Thanks, -Stolee
[PATCH 0/4] More merge cleanups
This series builds on en/merge-cleanup. Technically, this could be broken into three separate topics with only one of them depending on en/merge-cleanup, but I have a subsequent series that depends on both en/merge-cleanup and the fixes here, so I'm submitting these four patches as a set. Elijah Newren (4): t6036: add testcase where virtual merge base contains nested conflicts merge-recursive: increase marker length with depth of recursion Improving diff3 conflict markers in the face of arbitrarily deeply nested conflicts merge-recursive: improve auto-merging messages with path collisions Simple attempt at wording improvement merge-recursive: Avoid showing conflicts with merge branch before HEAD Conflict markers are expected to be shown in a certain order ll-merge.c| 4 +- ll-merge.h| 1 + merge-recursive.c | 59 +-- t/t6036-recursive-corner-cases.sh | 159 +- 4 files changed, 208 insertions(+), 15 deletions(-) -- 2.19.0.235.g7c386e1068
Re: [PATCH 0/4] Bloom filter experiment
On 10/9/2018 3:34 PM, SZEDER Gábor wrote: To keep the ball rolling, here is my proof of concept in a somewhat cleaned-up form, with still plenty of rough edges. You can play around with it like this: $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write Computing commit graph generation numbers: 100% (52801/52801), done. Computing bloom filter: 100% (52801/52801), done. # Yeah, I even added progress indicator! :) $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count --full-history HEAD -- t/valgrind/valgrind.sh 886 20:40:24.783699 revision.c:486 bloom filter total queries: 66095 definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873 The value of $GIT_USE_POC_BLOOM_FILTER only really matters when writing the Bloom filter, and it specifies the number of bits in the filter's bitmap, IOW the above command creates a 8MB Bloom filter. To make use of the filter the variable can be anything non-empty. Writing the Bloom filter is very slow as it is (yeah, that's why bothered with the progress indicator ;). I wrote about it in patch 2's commit message: the cause for about half of the slowness is rather obvious, but I don't (yet) know what's responsible for the other half. Not a single test... but I've run loops over all files in git.git comparing 'git rev-list HEAD -- $file's output with and without the Bloom filter, and, surprisingly, they match. My quick'n'dirty experiments usually don't fare this well... It's also available at: https://github.com/szeder/git bloom-filter-experiment Thanks! I will take a close look at this tomorrow and start playing with it. -Stolee
[PATCH 0/4] Bloom filter experiment
To keep the ball rolling, here is my proof of concept in a somewhat cleaned-up form, with still plenty of rough edges. You can play around with it like this: $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write Computing commit graph generation numbers: 100% (52801/52801), done. Computing bloom filter: 100% (52801/52801), done. # Yeah, I even added progress indicator! :) $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count --full-history HEAD -- t/valgrind/valgrind.sh 886 20:40:24.783699 revision.c:486 bloom filter total queries: 66095 definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873 The value of $GIT_USE_POC_BLOOM_FILTER only really matters when writing the Bloom filter, and it specifies the number of bits in the filter's bitmap, IOW the above command creates a 8MB Bloom filter. To make use of the filter the variable can be anything non-empty. Writing the Bloom filter is very slow as it is (yeah, that's why bothered with the progress indicator ;). I wrote about it in patch 2's commit message: the cause for about half of the slowness is rather obvious, but I don't (yet) know what's responsible for the other half. Not a single test... but I've run loops over all files in git.git comparing 'git rev-list HEAD -- $file's output with and without the Bloom filter, and, surprisingly, they match. My quick'n'dirty experiments usually don't fare this well... It's also available at: https://github.com/szeder/git bloom-filter-experiment SZEDER Gábor (4): Add a (very) barebones Bloom filter implementation commit-graph: write a Bloom filter containing changed paths for each commit revision.c: use the Bloom filter to speed up path-limited revision walks revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics Makefile | 1 + bloom-filter.c | 103 +++ bloom-filter.h | 39 +++ commit-graph.c | 116 pathspec.h | 1 + revision.c | 129 + 6 files changed, 389 insertions(+) create mode 100644 bloom-filter.c create mode 100644 bloom-filter.h -- 2.19.1.409.g0a0ee5eb6b
[PATCH 0/4] Multiple subtree split fixes regarding complex repos
We recently (about eight months ago) transitioned to git source control systems for several very large, very complex systems. We brought over several active versions requiring maintenance updates, and also set up several subtree repos to manage code shared between the systems. Recently, we attempted to push updates back to those subtrees and encountered errors. I believe I have identified and corrected the errors we found in our repos, and would like to contribute those fixes back. Commands to demonstrate both failures using the current version of the subtree script are here: https://gist.github.com/FoxFireX/1b794384612b7fd5e7cd157cff96269e Short summary of three problems involved: 1. Split using rejoins fails in some cases where a commit has a parent which was a parent commit further upstream from a rejoin, causing a new initial commit to be created, which is not related to the original subtree commits. 2. Split using rejoins fails to generate a merge commit which may have triaged the previous problem, but instead elected to use only the parent which is not connected to the original subtree commits. (This may occur when the commit and both parents all share the same subtree hash.) 3. Split ignoring joins also ignores the original add commit, which causes content prior to the add to be considered part of the subtree graph, changing the commit hashes so it is not connected to the original subtree commits. The following commits address each problem individually, along with a single commit that makes no functional change but performs a small refactor of the existing code. Hopefully that will make reviewing it a simpler task. This is my first attempt at submitting a patch back, so apologies if I've made any errors in the process. Strain, Roger L (4): subtree: refactor split of a commit into standalone method subtree: make --ignore-joins pay attention to adds subtree: use commits before rejoins for splits subtree: improve decision on merges kept in split contrib/subtree/git-subtree.sh | 129 + 1 file changed, 83 insertions(+), 46 deletions(-) -- 2.19.0.windows.1
[PATCH 0/4] git-commit-graph.txt: various cleanups
The first patch is a bug-fix. The second applies some more `monospace`-ing, which should also be good thing. The last two patches are based on my understanding that `git commit-graph` handles the "commit graph file", without a dash. If that's correct, there might be more such cleanups to be made in other parts of git.git. If the dash should actually be there, I could do these changes in the other direction. Or maybe dash-vs-no-dash is not an actual problem at all... Martin Martin Ågren (4): git-commit-graph.txt: fix bullet lists git-commit-graph.txt: typeset more in monospace git-commit-graph.txt: refer to "*commit* graph file" git-commit-graph.txt: refer to the "commit graph file" without dash Documentation/git-commit-graph.txt | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) -- 2.19.0.216.g2d3b1c576c
[PATCH 0/4] Cleanup of merge_*() functions in merge-recursive
While working on a series to make file collision conflict handling consistent, I discovered a latent bug in merge_content()...and several opportunities for cleanup. This series fixes that bug, deletes the merge_file_special_markers() and merge_file_one() functions, and renames a pair of functions to make them have a better description. Elijah Newren (4): merge-recursive: set paths correctly when three-way merging content merge-recursive: avoid wrapper function when unnecessary and wasteful merge-recursive: remove final remaining caller of merge_file_one() merge-recursive: rename merge_file_1() and merge_content() merge-recursive.c | 144 -- 1 file changed, 51 insertions(+), 93 deletions(-) -- 2.19.0.12.gc6760fd9a9
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
Jeff King writes: >> Either is fine. I am not moving 'next' beyond what is necessary for >> this release cycle during the pre-release freeze period, and because >> I thought that Peff was in favor of squashing in the changes to the >> original when the next cycle starts, I haven't bothered to merge the >> fix there yet. > > I think Ævar's point is just that the current tip of next is badly > broken if you use bitmaps, and it's worth hot-fixing that in the > meantime. I know that ;-) My point is that anybody who *needs* handholding from the upstream project (i.e. not Ævar, but those whom he is trying to help with the suggestion) should not be looking at 'next' during the prerelease freeze. They should be looking at 'master' instead, and that is why I am not spending more than minimum cycles of my own worrying about 'next' during the prerelease. In any case, I think the final is ready to go, almost. I still need to do the preformatted docs, though.
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
On Mon, Sep 10, 2018 at 09:53:56AM -0700, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > > > I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when > > building my own package of git, but I think this really should be fixed > > in that branch, either by merging the fix down or reverting the original > > series out of next, I think just merging the fix down makes sense, but > > have no strong opinion on it. > > Either is fine. I am not moving 'next' beyond what is necessary for > this release cycle during the pre-release freeze period, and because > I thought that Peff was in favor of squashing in the changes to the > original when the next cycle starts, I haven't bothered to merge the > fix there yet. I think Ævar's point is just that the current tip of next is badly broken if you use bitmaps, and it's worth hot-fixing that in the meantime. After the release, I'm OK with either one of squashing that first patch in, or just merging as-is. -Peff
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
Ævar Arnfjörð Bjarmason writes: > I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when > building my own package of git, but I think this really should be fixed > in that branch, either by merging the fix down or reverting the original > series out of next, I think just merging the fix down makes sense, but > have no strong opinion on it. Either is fine. I am not moving 'next' beyond what is necessary for this release cycle during the pre-release freeze period, and because I thought that Peff was in favor of squashing in the changes to the original when the next cycle starts, I haven't bothered to merge the fix there yet.
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
On Sat, Sep 01 2018, Jeff King wrote: > On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote: > >> On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> > On Tue, Aug 21 2018, Jeff King wrote: >> > >> > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, >> > > + const unsigned char *sha1) >> > > +{ >> > > +int pos; >> > > + >> > > +if (!bitmap_git) >> > > +return 0; /* no bitmap loaded */ >> > > +if (!bitmap_git->result) >> > > +BUG("failed to perform bitmap walk before querying"); >> > >> > Some part of what calls this completely breaks pushing from the "next" >> > branch when you have local bitmaps (we *really* should have some tests >> > for this...). >> >> Yikes, thanks for reporting. I agree we need better tests here. > > OK, here is the fix. Since the problem is in 'next', this is done as a > patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to > rewind 'next' post-release anyway, we could squash it directly into > 30cdc33fba from the original series. That would help later bisections > from running into it, which may be worth it as it's a pretty severe > breakage. Or maybe not: Junio: Just a reminder that next is still broken with this, and I see e.g. the Debian "experimental" has the bug but not the fix at this point. I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when building my own package of git, but I think this really should be fixed in that branch, either by merging the fix down or reverting the original series out of next, I think just merging the fix down makes sense, but have no strong opinion on it.
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
On Tue, Sep 04, 2018 at 12:30:14PM -0700, Stefan Beller wrote: > > [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check > > > > The actual fix. This should get merged to next ASAP (or the original > > topic just reverted). > > > > [2/4]: t5310: test delta reuse with bitmaps > > > > I did this separately to give us flexibility to squash or merge > > quickly. But it does find Ævar's bug on a git without patch 1. > > > > [3/4]: traverse_bitmap_commit_list(): don't free result > > > > The original assert should have simply been useless, but it was the > > surprising behavior of this function that turned it into a bug. > > > > [4/4]: pack-bitmap: drop "loaded" flag > > > > And this is just an annoyance I ran into, which is a fallout from > > our conversion to using an allocated bitmap_index struct. > > FWIW, the whole series is > Reviewed-by: Stefan Beller > I am sorry for suggesting the BUG() in the first place. Heh. You only asked a question about the interface. I was the one who was _supposed_ to be familiar with the code, and put in the actual assertion. So if your suggestion was dumb, my response was doubly so. :) -Peff
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
> [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check > > The actual fix. This should get merged to next ASAP (or the original > topic just reverted). > > [2/4]: t5310: test delta reuse with bitmaps > > I did this separately to give us flexibility to squash or merge > quickly. But it does find Ævar's bug on a git without patch 1. > > [3/4]: traverse_bitmap_commit_list(): don't free result > > The original assert should have simply been useless, but it was the > surprising behavior of this function that turned it into a bug. > > [4/4]: pack-bitmap: drop "loaded" flag > > And this is just an annoyance I ran into, which is a fallout from > our conversion to using an allocated bitmap_index struct. FWIW, the whole series is Reviewed-by: Stefan Beller I am sorry for suggesting the BUG() in the first place. Stefan
[PATCH 0/4] un-breaking pack-objects with bitmaps
On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote: > On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote: > > > On Tue, Aug 21 2018, Jeff King wrote: > > > > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, > > > + const unsigned char *sha1) > > > +{ > > > + int pos; > > > + > > > + if (!bitmap_git) > > > + return 0; /* no bitmap loaded */ > > > + if (!bitmap_git->result) > > > + BUG("failed to perform bitmap walk before querying"); > > > > Some part of what calls this completely breaks pushing from the "next" > > branch when you have local bitmaps (we *really* should have some tests > > for this...). > > Yikes, thanks for reporting. I agree we need better tests here. OK, here is the fix. Since the problem is in 'next', this is done as a patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to rewind 'next' post-release anyway, we could squash it directly into 30cdc33fba from the original series. That would help later bisections from running into it, which may be worth it as it's a pretty severe breakage. Or maybe not: 1. The test suite doesn't actually fail, because it's toy repos are too small. 2. It only triggers in the real-world if you have bitmaps turned on, which are not the default. So it may not be that likely in practice to bother a hypothetical future bisecting developer. > [1] Actually, there is also prepare_bitmap_git(), but it is not really > for general use by callers. It should be made static, or better yet, > I suspect it can be folded into its callers. This actually turned out not to work. There's a caller over in pack-bitmap-write.c, and it makes things worse to try to expand the logic there. So it technically _is_ possible to have a bitmap_index without a "have" field, but it also doesn't make sense to ask about "uninteresting" objects there. You haven't done (and cannot do) a traversal on such an object. Which I think goes back to Stefan's original question: is this just a crappy API? And the answer is "yes, to some degree". There are really two uses of bitmaps: - you want to do a traverse_commit_list() walk, but faster - you want to selectively query the on-disk bitmaps (e.g., you are walking for --contains and want to ask "do we have a bitmap for this object?" Those currently use the same struct bitmap_index, but with two different constructors (prepare_bitmap_git and prepare_bitmap_walk). It probably ought to be two different ones (with the "walk" variant using the "query" variant under the hood). I've punted on that full conversion for now, but did clean up a few confusing bits. [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check The actual fix. This should get merged to next ASAP (or the original topic just reverted). [2/4]: t5310: test delta reuse with bitmaps I did this separately to give us flexibility to squash or merge quickly. But it does find Ævar's bug on a git without patch 1. [3/4]: traverse_bitmap_commit_list(): don't free result The original assert should have simply been useless, but it was the surprising behavior of this function that turned it into a bug. [4/4]: pack-bitmap: drop "loaded" flag And this is just an annoyance I ran into, which is a fallout from our conversion to using an allocated bitmap_index struct. pack-bitmap.c | 14 ++- pack-bitmap.h | 2 +- t/t5310-pack-bitmaps.sh | 93 + 3 files changed, 97 insertions(+), 12 deletions(-) -Peff
Re: [PATCH 0/4] tests: make more use of 'test_must_be_empty'
Note that you can also trivially convert a lot more instances of empty checking to the "right" way: 'test_line_count = 0' $ grep -Ir 'test_line_count = 0' t | wc -l 76 I think it would be nice to clean these up as well.
[PATCH 0/4] tests: make more use of 'test_must_be_empty'
This series is a continuation of and applies on top of 'ab/test-must-be-empty-for-master', and converts even more places to use 'test_must_be_empty'. There are still a bunch of cases in the form of 'test -z "$(cmd...)"' that could use 'test_must_be_empty'... maybe someday. SZEDER Gábor (4): tests: use 'test_must_be_empty' instead of '! test -s' tests: use 'test_must_be_empty' instead of 'test ! -s' tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null ' tests: use 'test_must_be_empty' instead of 'test_cmp ' t/t-basic.sh | 2 +- t/t0001-init.sh | 5 +-- t/t0003-attributes.sh| 3 +- t/t0030-stripspace.sh| 22 -- t/t0040-parse-options.sh | 4 +- t/t0061-run-command.sh | 9 ++-- t/t0090-cache-tree.sh| 2 +- t/t0203-gettext-setlocale-sanity.sh | 4 +- t/t1300-config.sh| 5 +-- t/t1410-reflog.sh| 3 +- t/t1411-reflog-show.sh | 3 +- t/t1450-fsck.sh | 11 +++-- t/t1501-work-tree.sh | 2 +- t/t1510-repo-setup.sh| 24 +-- t/t1600-index.sh | 3 +- t/t1700-split-index.sh | 4 +- t/t2013-checkout-submodule.sh| 6 +-- t/t2200-add-update.sh| 3 +- t/t2203-add-intent.sh| 6 +-- t/t2204-add-ignored.sh | 8 ++-- t/t3001-ls-files-others-exclude.sh | 6 +-- t/t3004-ls-files-basic.sh| 6 +-- t/t3035-merge-sparse.sh | 5 +-- t/t3210-pack-refs.sh | 6 +-- t/t3301-notes.sh | 4 +- t/t3308-notes-merge.sh | 2 +- t/t3310-notes-merge-manual-resolve.sh| 8 ++-- t/t3404-rebase-interactive.sh| 5 +-- t/t3600-rm.sh| 6 +-- t/t4011-diff-symlink.sh | 2 +- t/t4015-diff-whitespace.sh | 4 +- t/t4019-diff-wserror.sh | 2 +- t/t4027-diff-submodule.sh| 29 ++--- t/t4041-diff-submodule-option.sh | 18 t/t4047-diff-dirstat.sh | 4 +- t/t4060-diff-submodule-option-diff-format.sh | 18 t/t4116-apply-reverse.sh | 2 +- t/t4124-apply-ws-rule.sh | 2 +- t/t4132-apply-removal.sh | 5 +-- t/t4150-am.sh| 4 +- t/t4201-shortlog.sh | 2 +- t/t4203-mailmap.sh | 4 +- t/t4211-line-log.sh | 2 +- t/t4212-log-corrupt.sh | 6 +-- t/t4300-merge-tree.sh| 34 +++ t/t5304-prune.sh | 3 +- t/t5314-pack-cycle-detection.sh | 3 +- t/t5401-update-hooks.sh | 10 ++--- t/t5500-fetch-pack.sh| 2 +- t/t5509-fetch-push-namespaces.sh | 2 +- t/t5523-push-upstream.sh | 2 +- t/t5526-fetch-submodules.sh | 45 ++-- t/t5541-http-push-smart.sh | 2 +- t/t5570-git-daemon.sh| 2 +- t/t6112-rev-list-filters-objects.sh | 3 +- t/t6120-describe.sh | 3 +- t/t6130-pathspec-noglob.sh | 9 ++-- t/t6200-fmt-merge-msg.sh | 4 +- t/t7001-mv.sh| 6 +-- t/t7004-tag.sh | 8 +--- t/t7008-grep-binary.sh | 6 +-- t/t7064-wtstatus-pv2.sh | 5 +-- t/t7201-co.sh| 7 ++- t/t7400-submodule-basic.sh | 28 t/t7401-submodule-summary.sh | 2 +- t/t7406-submodule-update.sh | 2 +- t/t7501-commit.sh| 6 +-- t/t7600-merge.sh | 9 ++-- t/t7610-mergetool.sh | 3 +- t/t7810-grep.sh | 35 ++- t/t7811-grep-open.sh | 18 +++- t/t8010-cat-file-filters.sh | 2 +- t/t9011-svn-da.sh| 14 +++--- t/t9131-git-svn-empty-symlink.sh | 6 +-- t/t9135-git-svn-moved-branch-empty-file.sh | 3 +- t/t9200-git-cvsexportcommit.sh | 4 +- t/t9802-git-p4-filetype.sh | 2 +- t/t9903-bash-prompt.sh | 6 +-- 78 files changed, 232 insertions(+), 345 deletions(-) -- 2.18.0.903.gab616d7dc6
[PATCH 0/4] finishing touches on jk/for-each-object-iteration
On Mon, Aug 13, 2018 at 11:45:06AM -0700, Jonathan Tan wrote: > > [1/7]: for_each_*_object: store flag definitions in a single location > > [2/7]: for_each_*_object: take flag arguments as enum > > [3/7]: for_each_*_object: give more comprehensive docstrings > > [4/7]: for_each_packed_object: support iterating in pack-order > > [5/7]: t1006: test cat-file --batch-all-objects with duplicates > > [6/7]: cat-file: rename batch_{loose,packed}_object callbacks > > [7/7]: cat-file: support "unordered" output for --batch-all-objects > > Thanks for laying all the patches out so cleanly! All of them are: > Reviewed-by: Jonathan Tan > > Normally I would re-explain the patches to demonstrate that I understand > them, but in this case, I think they are simple enough - patches 1, 2, > 3, and 6 are refactorings that I agree with, patch 5 just makes a test > more comprehensive, and patches 4 and 7 do what their commit messages > say. > > Stefan brought up the concern that cache.h is increasing in size, but I > agree with the patch as written that it's probably best that we > centralize all the flags somewhere, and we can deal with the location in > a future patch. Thanks for the review. Here are a few patches on top to deal with the cache.h thing, as well as some optimizations that came out of discussing oidset in another thread (I left out for now the "big" optimization of moving oidset to a different data structure; that's complicated enough to be dealt with on its own, I think). The first patch here could arguably be squashed into the final patch of the original series, but I'm OK with it either way. [1/4]: cat-file: use oidset check-and-insert [2/4]: cat-file: split batch "buf" into two variables [3/4]: cat-file: use a single strbuf for all output [4/4]: for_each_*_object: move declarations to object-store.h builtin/cat-file.c | 43 +++- builtin/prune-packed.c | 1 + cache.h| 75 --- object-store.h | 90 ++ packfile.h | 20 -- 5 files changed, 116 insertions(+), 113 deletions(-) -Peff
[PATCH 0/4] Better colors in range-diff!
This improves colors of the range-diff, see last patch for details. This is a partial resend of https://public-inbox.org/git/20180804015317.182683-1-sbel...@google.com/ and is also available via git fetch https://github.com/stefanbeller/git sb/range-diff-better-colors It applies on the (just reset) series of sb/range-diff-colors. Thanks, Stefan Stefan Beller (4): diff.c: emit_line_0 to take string instead of first sign diff.c: add --output-indicator-{new, old, context} range-diff: make use of different output indicators range-diff: indent special lines as context diff.c| 43 +++ diff.h| 5 + range-diff.c | 17 - t/t3206-range-diff.sh | 12 ++-- 4 files changed, 58 insertions(+), 19 deletions(-) -- 2.18.0.865.gffc8e1a3cd6-goog
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Hi Peff, On Fri, 10 Aug 2018, Jeff King wrote: > On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote: > > > So unless you are willing to ignore, to willfully keep this breakage, > > I would suggest not to introduce the ugliness of an overridden > > upload-pack for the sole purpose of disabling the tracing on one side, > > but instead to get this here bug fixed, by helping me with this here > > patch series. > > I'm OK if you want to live with the broken test in the interim. I realize that I failed to tell you that I basically spent 2.5 days worth of worktime to figure this out and come up with three iterations of the patch series (you only saw the latest). I want this patch series in git.git, maybe not in the current form, but in one form or another. I don't want to spend any more minute on trying to figure out the same problem with any other regression test (which might not even be as easily worked around as with a semi-simple --upload-pack option). Thank you for wanting to help. Please accept my apologies for expressing in a poor way that I appreciate your eagerness to provide a patch, but that I think nevertheless that it would be better to work on the GIT_TRACE concurrency problem and its resolution via file locks. It would solve that class of problems, instead of that single regression test being flakey. Sorry, Dscho
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Fri, Aug 10, 2018 at 11:34:59AM -0700, Junio C Hamano wrote: > Johannes Sixt writes: > > > As this buglet looks like a recurring theme, and a proper fix is > > preferable over repeated work-arounds. To me it looks like we need > > some sort of locking on Windows. Unless your friends at Microsoft have > > an ace in their sleeves that lets us have atomic O_APPEND the POSIX > > way... > > Just to put the severity of the issue in context, we use O_APPEND in > a few codepaths, and the trace thing for debugging is the only thing > that could have multiple writers. Other users of O_APPEND are: > > * refs/files-backend.c uses it so that a reflog entry can be >appended at the end, but because update to each ref is protected >from racing at a lot higher layer with a lock, no two people >would try to append to the same reflog file, so atomicity of >O_APPEND does not matter here. Just an interesting side note, but I think one of the reasons we use dot-locks and not flock() is that the former generally works on network filesystems. I think O_APPEND also is not reliable for non-local files, though, so short of actually dot-locking trace files (yuck) I don't think there's a great solution there. > It may make sense to allow GIT_TRACE to have a placeholder > (e.g. "/tmp/traceout.$$") to help debuggers arrange to give > different processes their own trace output file, which perhaps may > be a simpler and less impactful to the performance solution than > having to make locks at an application layer. Heh, I actually wrote that patch several years ago, as part of an abandoned effort to have config-triggered tracing (e.g., on the server side where you're getting hundreds of requests and you want to enable tracing on a repo for a slice of time). One annoying thing for a case like the test in question is that you don't actually know the pid of the process you're interested in. So we'd generate two trace files, and then you'd have to figure out which one is which. In my earlier work, I coupled it with some magic to allow per-command config, like: [trace "fetch"] packet = /tmp/trace.%p but you could also imagine a "%n" which uses the command name. I don't remember why I abandoned it, but I think a lot had to do with violating the "don't look at config" rules for our repo startup code. A lot of that has been fixed since then, but I haven't really needed it. If anybody is really interested, the patches are at: https://github.com/peff/git jk/trace-stdin (my ultimate goal was to record stdin for pack-objects to replay slow fetches, but I ended up hacking together a patch to pack-objects instead). -Peff
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > I have tried to help with the actual problem, by identifying the root > cause (that the trace code's strategy is not fundamentally broken, but > that it relies on O_APPEND whose guarantees are obviously not portable > to Windows) and exploring possible solutions there (FILE_APPEND_DATA). > > Lacking any kind of Windows development environment, I'd just as soon > stop there and let more knowledgeable people take over. But please don't > characterize my presence in this thread as some kind of willful harm or > ignoring of the problem. Thanks. It seems like the other Johannes and the other Jeff are also interested, so we may hopefully see a properly fixed O_APPEND, which would be the real solution that would help appenders that are *not* tracing subsystem, too.
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Johannes Sixt writes: > As this buglet looks like a recurring theme, and a proper fix is > preferable over repeated work-arounds. To me it looks like we need > some sort of locking on Windows. Unless your friends at Microsoft have > an ace in their sleeves that lets us have atomic O_APPEND the POSIX > way... Just to put the severity of the issue in context, we use O_APPEND in a few codepaths, and the trace thing for debugging is the only thing that could have multiple writers. Other users of O_APPEND are: * refs/files-backend.c uses it so that a reflog entry can be appended at the end, but because update to each ref is protected from racing at a lot higher layer with a lock, no two people would try to append to the same reflog file, so atomicity of O_APPEND does not matter here. * sequencer.c wants to use it when moving one insn from the todo list to the 'done' list when it finishes one operation. If you are running two sequences in a single repository, intermixed 'done' list would be the least of your problem, so presumably we are fine here. It may make sense to allow GIT_TRACE to have a placeholder (e.g. "/tmp/traceout.$$") to help debuggers arrange to give different processes their own trace output file, which perhaps may be a simpler and less impactful to the performance solution than having to make locks at an application layer.
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote: > > > Correct. But I am more worried about the "mixed/overwriting" > > > breakage, if there is one; it means we may need to be prepared for > > > systems that lack O_APPEND that works correctly. I initially just > > > assumed that it was what Dscho was seeing, but after re-reading his > > > message, I am not sure anymore. > > > > > > I think the "do not trace the other side" approach you suggest for > > > these tests that only care about one side is more appropriate > > > solution for this particular case. We then do not have to worry > > > about overwriting or output from both sides mixed randomly. > > Just so everybody knows: I never received the above two mails. Something > is seriously rotten in GMX. This seems to have started a week or two ago. I think it may just have been that they went to the gitgitgadget address; I explicitly added you as a cc when I sent the patch with a commit message, which was when I first noticed you weren't cc'd on the earlier bits. > What you cannot see easily, unless you go the route that I offered > Jonathan (open a PR on gitgitgadget/git which will automatically run the > test suite on Windows, macOS and Linux, and of course you can do anything > in a PR, including narrowing down what tests are run) is that sometimes > those lines in the `trace` file were clearly *incomplete*. That is, even > if both processes tried to write atomically, one managed to overwrite the > other's buffer in flight. > > This is a pretty serious thing, even worse than the failing test suite, > and I don't think that your patch acknowledges how serious this is. I don't see anything in what you wrote above or in your original cover letter or series that contradicts the notion that O_APPEND is simply not atomic on Windows. Overwriting the same byte ranges in the file is what I would expect on Linux, as well, if the file were opened without O_APPEND. I have mixed reports from searching online whether an equivalent to O_APPEND exists. Some sources seem to say that FILE_APPEND_DATA is. I couldn't find anything definitive in Microsoft documentation about whether it actually provides atomicity. From my (admittedly limited) digging, that does not seem to be used by the msys2 runtime, so it might be worth exploring. > And please don't give me that "but it works on Linux" response. Seriously. > Sheesh. But it does. And it's designed to. There is literally nothing to fix on Linux. > Even if you have not managed to trigger it, the POSIX standard seems not > to define clearly what the behavior is of two competing write() calls, > unless you lock the files appropriately, as my patch series does. POSIX says: If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation. and: This volume of POSIX.1-2017 does not specify the behavior of concurrent writes to a regular file from multiple threads, except that each write is atomic. I admit the writing is a little turgid, but I think between the two it is promising the behavior that we expect (and I think that interpretation is pretty common in the Unix world). > So unless you are willing to ignore, to willfully keep this breakage, I > would suggest not to introduce the ugliness of an overridden upload-pack > for the sole purpose of disabling the tracing on one side, but instead to > get this here bug fixed, by helping me with this here patch series. I'm OK if you want to live with the broken test in the interim. But after your complaining about the imminence of it "infecting" master, I thought you might be happy to see a solution that was more immediate. I have tried to help with the actual problem, by identifying the root cause (that the trace code's strategy is not fundamentally broken, but that it relies on O_APPEND whose guarantees are obviously not portable to Windows) and exploring possible solutions there (FILE_APPEND_DATA). Lacking any kind of Windows development environment, I'd just as soon stop there and let more knowledgeable people take over. But please don't characterize my presence in this thread as some kind of willful harm or ignoring of the problem. -Peff
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Am 10.08.2018 um 18:51 schrieb Jeff Hostetler: On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I looked at the CRT code, too, and no, we can't trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) As far as I understand, FILE_APPEND_DATA is just a permission flag (I'm not sure why that would be necessary), but at least the documentation states that it is still necessary for the caller to set the file pointer. But I haven't tried your code below, yet. Should it append the line of text on each invocation? And if so, is it an atomic operation? Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On 8/10/2018 12:51 PM, Jeff Hostetler wrote: On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) d'oh s/GENERIC_READ/GENERIC_WRITE/ Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Hi, On Fri, 10 Aug 2018, Jeff King wrote: > On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote: > > > Jeff King writes: > > > > > Are you sure that it's not well-defined? We open the path with O_APPEND, > > > which means every write() will be atomically positioned at the end of > > > file. So we would never lose or overwrite data. > > > > > > We do our own buffering in a strbuf, writing the result out in a single > > > write() call (modulo the OS returning a short write, but that should not > > > generally happen when writing short strings to a file). So we should get > > > individual trace lines as atomic units. > > > > > > The order of lines from the two processes is undefined, of course. > > > > Correct. But I am more worried about the "mixed/overwriting" > > breakage, if there is one; it means we may need to be prepared for > > systems that lack O_APPEND that works correctly. I initially just > > assumed that it was what Dscho was seeing, but after re-reading his > > message, I am not sure anymore. > > > > I think the "do not trace the other side" approach you suggest for > > these tests that only care about one side is more appropriate > > solution for this particular case. We then do not have to worry > > about overwriting or output from both sides mixed randomly. Just so everybody knows: I never received the above two mails. Something is seriously rotten in GMX. This seems to have started a week or two ago. > Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets > us pursue any fix for interleaved trace output on Windows without the > pressure of an impending flaky test. > > My gut says that looking into making O_APPEND work there is going to be > the nicest solution, but my gut is not very well versed in Windows > subtleties. ;) Your patch seems to be a patch in the original sense of the word: it is not a solution. And let me spell it out very clearly (as I have mentioned to Jonathan a couple times, without any response): t5552 fails regularly, way more than rarely, see for yourself: https://git-for-windows.visualstudio.com/git/_build/index?definitionId=1&_a=history And the reason it fails was analyzed by me and described in the commit message (I am sorry that the cover letter was still stale and talked about flock(), which I had decided was not the right approach after fighting with Linux over it). What you cannot see easily, unless you go the route that I offered Jonathan (open a PR on gitgitgadget/git which will automatically run the test suite on Windows, macOS and Linux, and of course you can do anything in a PR, including narrowing down what tests are run) is that sometimes those lines in the `trace` file were clearly *incomplete*. That is, even if both processes tried to write atomically, one managed to overwrite the other's buffer in flight. This is a pretty serious thing, even worse than the failing test suite, and I don't think that your patch acknowledges how serious this is. And please don't give me that "but it works on Linux" response. Seriously. Sheesh. Even if you have not managed to trigger it, the POSIX standard seems not to define clearly what the behavior is of two competing write() calls, unless you lock the files appropriately, as my patch series does. So unless you are willing to ignore, to willfully keep this breakage, I would suggest not to introduce the ugliness of an overridden upload-pack for the sole purpose of disabling the tracing on one side, but instead to get this here bug fixed, by helping me with this here patch series. We don't have to rush this, you know? We have to fix it, though. Thank you, Dscho > -- >8 -- > Subject: [PATCH] t5552: suppress upload-pack trace output > > The t5552 test script uses GIT_TRACE_PACKET to monitor what > git-fetch sends and receives. However, because we're > accessing a local repository, the child upload-pack also > sends trace output to the same file. > > On Linux, this works out OK. We open the trace file with > O_APPEND, so all writes are atomically positioned at the end > of the file. No data can be overwritten or omitted. And > since we prepare our small writes in a strbuf and write them > with a single write(), we should see each line as an atomic > unit. The order of lines between the two processes is > undefined, but the test script greps only for "fetch>" or > "fetch<" lines. So under Linux, the test results are > deterministic. > > The test fails intermittently on Windows, however, > reportedly even overwriting bits of the output file (i.e., > O_APPEND does not seem to give us an atomic position+write). > > Since the test only cares about the trace output from fetch, > we can just disable the output from upload-pack. That > doesn't solve the greater question of O_APPEND/trace issues > under Windows, but it easily fixes the flakiness from this > test. > > Reported-by: Johannes Schindelin > Signed-off-by: Jeff King > --- > I'm assuming that this really isn't
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... -- Hannes
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets > us pursue any fix for interleaved trace output on Windows without the > pressure of an impending flaky test. > > My gut says that looking into making O_APPEND work there is going to be > the nicest solution, but my gut is not very well versed in Windows > subtleties. ;) As we know these tests are only interested in "fetch" lines and not record of communication in the other direction, I think this is a nice clean-up that is worth doing. For only two grep's it may not look worth doing, but it would be also a good clean-up to give an got_acked helper that sits next to have_sent and have_not_sent helpers for readability. That is of course outside the scope of this change. Will queue. Thanks. > -- >8 -- > Subject: [PATCH] t5552: suppress upload-pack trace output > > The t5552 test script uses GIT_TRACE_PACKET to monitor what > git-fetch sends and receives. However, because we're > accessing a local repository, the child upload-pack also > sends trace output to the same file. > > On Linux, this works out OK. We open the trace file with > O_APPEND, so all writes are atomically positioned at the end > of the file. No data can be overwritten or omitted. And > since we prepare our small writes in a strbuf and write them > with a single write(), we should see each line as an atomic > unit. The order of lines between the two processes is > undefined, but the test script greps only for "fetch>" or > "fetch<" lines. So under Linux, the test results are > deterministic. > > The test fails intermittently on Windows, however, > reportedly even overwriting bits of the output file (i.e., > O_APPEND does not seem to give us an atomic position+write). > > Since the test only cares about the trace output from fetch, > we can just disable the output from upload-pack. That > doesn't solve the greater question of O_APPEND/trace issues > under Windows, but it easily fixes the flakiness from this > test. > > Reported-by: Johannes Schindelin > Signed-off-by: Jeff King > --- > I'm assuming that this really isn't triggerable on Linux. I tried and > couldn't manage to get it to fail, and the reasoning above explains why. > But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows. > > t/t5552-skipping-fetch-negotiator.sh | 23 ++- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/t/t5552-skipping-fetch-negotiator.sh > b/t/t5552-skipping-fetch-negotiator.sh > index 0a8e0e42ed..0ad50dd839 100755 > --- a/t/t5552-skipping-fetch-negotiator.sh > +++ b/t/t5552-skipping-fetch-negotiator.sh > @@ -28,6 +28,19 @@ have_not_sent () { > done > } > > +# trace_fetch [args] > +# > +# Trace the packet output of fetch, but make sure we disable the variable > +# in the child upload-pack, so we don't combine the results in the same file. > +trace_fetch () { > + client=$1; shift > + server=$1; shift > + GIT_TRACE_PACKET="$(pwd)/trace" \ > + git -C "$client" fetch \ > + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ > + "$server" "$@" > +} > + > test_expect_success 'commits with no parents are sent regardless of skip > distance' ' > git init server && > test_commit -C server to_fetch && > @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent > regardless of skip distanc > # "c1" has no parent, it is still sent as "have" even though it would > # normally be skipped. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c7 c5 c2 c1 && > have_not_sent c6 c4 c3 > ' > @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the > larger one' ' > # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" > # (from "c5side" skip 1). > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c5side c11 c9 c6 c1 && > have_not_sent c10 c8 c7 c5 c4 c3 c2 > ' > @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out > commits' ' > # not need to send any ancestors of "c3", but we still need to send "c3" > # itself. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && > + trace_fetch client origin to_fetch && > have_sent c5 c4^ c2side && > have_not_sent c4 c4^^ c4^^^ > ' > @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' ' > # and sent, because (due to clock skew) its only parent has already been > # popped off the priority queue. > test_config -C client fetch.negotiationalgorithm skipping && > -
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets > us pursue any fix for interleaved trace output on Windows without the > pressure of an impending flaky test. > > My gut says that looking into making O_APPEND work there is going to be > the nicest solution, but my gut is not very well versed in Windows > subtleties. ;) As we know these tests are only interested in "fetch" lines and not record of communication in the other direction, I think this is a nice clean-up that is worth doing. For only two grep's it may not look worth doing, but it would be also a good clean-up to give an got_acked helper that sits next to have_sent and have_not_sent helpers for readability. That is of course outside the scope of this change. Will queue. Thansk. > -- >8 -- > Subject: [PATCH] t5552: suppress upload-pack trace output > > The t5552 test script uses GIT_TRACE_PACKET to monitor what > git-fetch sends and receives. However, because we're > accessing a local repository, the child upload-pack also > sends trace output to the same file. > > On Linux, this works out OK. We open the trace file with > O_APPEND, so all writes are atomically positioned at the end > of the file. No data can be overwritten or omitted. And > since we prepare our small writes in a strbuf and write them > with a single write(), we should see each line as an atomic > unit. The order of lines between the two processes is > undefined, but the test script greps only for "fetch>" or > "fetch<" lines. So under Linux, the test results are > deterministic. > > The test fails intermittently on Windows, however, > reportedly even overwriting bits of the output file (i.e., > O_APPEND does not seem to give us an atomic position+write). > > Since the test only cares about the trace output from fetch, > we can just disable the output from upload-pack. That > doesn't solve the greater question of O_APPEND/trace issues > under Windows, but it easily fixes the flakiness from this > test. > > Reported-by: Johannes Schindelin > Signed-off-by: Jeff King > --- > I'm assuming that this really isn't triggerable on Linux. I tried and > couldn't manage to get it to fail, and the reasoning above explains why. > But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows. > > t/t5552-skipping-fetch-negotiator.sh | 23 ++- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/t/t5552-skipping-fetch-negotiator.sh > b/t/t5552-skipping-fetch-negotiator.sh > index 0a8e0e42ed..0ad50dd839 100755 > --- a/t/t5552-skipping-fetch-negotiator.sh > +++ b/t/t5552-skipping-fetch-negotiator.sh > @@ -28,6 +28,19 @@ have_not_sent () { > done > } > > +# trace_fetch [args] > +# > +# Trace the packet output of fetch, but make sure we disable the variable > +# in the child upload-pack, so we don't combine the results in the same file. > +trace_fetch () { > + client=$1; shift > + server=$1; shift > + GIT_TRACE_PACKET="$(pwd)/trace" \ > + git -C "$client" fetch \ > + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ > + "$server" "$@" > +} > + > test_expect_success 'commits with no parents are sent regardless of skip > distance' ' > git init server && > test_commit -C server to_fetch && > @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent > regardless of skip distanc > # "c1" has no parent, it is still sent as "have" even though it would > # normally be skipped. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c7 c5 c2 c1 && > have_not_sent c6 c4 c3 > ' > @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the > larger one' ' > # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" > # (from "c5side" skip 1). > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c5side c11 c9 c6 c1 && > have_not_sent c10 c8 c7 c5 c4 c3 c2 > ' > @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out > commits' ' > # not need to send any ancestors of "c3", but we still need to send "c3" > # itself. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && > + trace_fetch client origin to_fetch && > have_sent c5 c4^ c2side && > have_not_sent c4 c4^^ c4^^^ > ' > @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' ' > # and sent, because (due to clock skew) its only parent has already been > # popped off the priority queue. > test_config -C client fetch.negotiationalgorithm skipping && > -
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Are you sure that it's not well-defined? We open the path with O_APPEND, > > which means every write() will be atomically positioned at the end of > > file. So we would never lose or overwrite data. > > > > We do our own buffering in a strbuf, writing the result out in a single > > write() call (modulo the OS returning a short write, but that should not > > generally happen when writing short strings to a file). So we should get > > individual trace lines as atomic units. > > > > The order of lines from the two processes is undefined, of course. > > Correct. But I am more worried about the "mixed/overwriting" > breakage, if there is one; it means we may need to be prepared for > systems that lack O_APPEND that works correctly. I initially just > assumed that it was what Dscho was seeing, but after re-reading his > message, I am not sure anymore. > > I think the "do not trace the other side" approach you suggest for > these tests that only care about one side is more appropriate > solution for this particular case. We then do not have to worry > about overwriting or output from both sides mixed randomly. Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets us pursue any fix for interleaved trace output on Windows without the pressure of an impending flaky test. My gut says that looking into making O_APPEND work there is going to be the nicest solution, but my gut is not very well versed in Windows subtleties. ;) -- >8 -- Subject: [PATCH] t5552: suppress upload-pack trace output The t5552 test script uses GIT_TRACE_PACKET to monitor what git-fetch sends and receives. However, because we're accessing a local repository, the child upload-pack also sends trace output to the same file. On Linux, this works out OK. We open the trace file with O_APPEND, so all writes are atomically positioned at the end of the file. No data can be overwritten or omitted. And since we prepare our small writes in a strbuf and write them with a single write(), we should see each line as an atomic unit. The order of lines between the two processes is undefined, but the test script greps only for "fetch>" or "fetch<" lines. So under Linux, the test results are deterministic. The test fails intermittently on Windows, however, reportedly even overwriting bits of the output file (i.e., O_APPEND does not seem to give us an atomic position+write). Since the test only cares about the trace output from fetch, we can just disable the output from upload-pack. That doesn't solve the greater question of O_APPEND/trace issues under Windows, but it easily fixes the flakiness from this test. Reported-by: Johannes Schindelin Signed-off-by: Jeff King --- I'm assuming that this really isn't triggerable on Linux. I tried and couldn't manage to get it to fail, and the reasoning above explains why. But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows. t/t5552-skipping-fetch-negotiator.sh | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/t/t5552-skipping-fetch-negotiator.sh b/t/t5552-skipping-fetch-negotiator.sh index 0a8e0e42ed..0ad50dd839 100755 --- a/t/t5552-skipping-fetch-negotiator.sh +++ b/t/t5552-skipping-fetch-negotiator.sh @@ -28,6 +28,19 @@ have_not_sent () { done } +# trace_fetch [args] +# +# Trace the packet output of fetch, but make sure we disable the variable +# in the child upload-pack, so we don't combine the results in the same file. +trace_fetch () { + client=$1; shift + server=$1; shift + GIT_TRACE_PACKET="$(pwd)/trace" \ + git -C "$client" fetch \ + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ + "$server" "$@" +} + test_expect_success 'commits with no parents are sent regardless of skip distance' ' git init server && test_commit -C server to_fetch && @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent regardless of skip distanc # "c1" has no parent, it is still sent as "have" even though it would # normally be skipped. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c7 c5 c2 c1 && have_not_sent c6 c4 c3 ' @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the larger one' ' # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" # (from "c5side" skip 1). test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c5side c11 c9 c6 c1 && have_not_sent c10 c8 c7 c5 c4 c3 c2 ' @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Thu, Aug 09, 2018 at 02:08:56PM -0700, Junio C Hamano wrote: > > Correct. But I am more worried about the "mixed/overwriting" > > breakage, if there is one; it means we may need to be prepared for > > systems that lack O_APPEND that works correctly. I initially just > > assumed that it was what Dscho was seeing, but after re-reading his > > message, I am not sure anymore. > > > > I think the "do not trace the other side" approach you suggest for > > these tests that only care about one side is more appropriate > > solution for this particular case. We then do not have to worry > > about overwriting or output from both sides mixed randomly. > > A concluding sentence I forgot to add, after saying "this is simpler > and better to fix test breakage", was > > But if we really are seeing O_APPEND breakage, a mandatory > locking mechanism like this one might be necessary to work > around it (I seriously hope we do not have to, though). > > Sorry for an additional noise. Yeah, my earlier email said "if we just care about this test, then...". But if we do care about making this multiple-tracing case work all the time, then we'd need some solution. If there's something that can work like O_APPEND, even if we have to make some system-specific call, I'd prefer that to locking. I did some searching for atomic append on Windows and got mixed reports on whether their FILE_DATA_APPEND does what we want or not. Knowing nothing about msys, I'd _assume_ that O_APPEND would get translated to that flag, but I got lost trying to find it in git-for-windows/msys2-runtime. Wouldn't it be wonderful if the solution were as simple as making sure that flag was plumbed through? I seriously doubt it will be that easy, though. :) -Peff
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Junio C Hamano writes: > Jeff King writes: > >> Are you sure that it's not well-defined? We open the path with O_APPEND, >> which means every write() will be atomically positioned at the end of >> file. So we would never lose or overwrite data. >> >> We do our own buffering in a strbuf, writing the result out in a single >> write() call (modulo the OS returning a short write, but that should not >> generally happen when writing short strings to a file). So we should get >> individual trace lines as atomic units. >> >> The order of lines from the two processes is undefined, of course. > > Correct. But I am more worried about the "mixed/overwriting" > breakage, if there is one; it means we may need to be prepared for > systems that lack O_APPEND that works correctly. I initially just > assumed that it was what Dscho was seeing, but after re-reading his > message, I am not sure anymore. > > I think the "do not trace the other side" approach you suggest for > these tests that only care about one side is more appropriate > solution for this particular case. We then do not have to worry > about overwriting or output from both sides mixed randomly. A concluding sentence I forgot to add, after saying "this is simpler and better to fix test breakage", was But if we really are seeing O_APPEND breakage, a mandatory locking mechanism like this one might be necessary to work around it (I seriously hope we do not have to, though). Sorry for an additional noise.
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > Are you sure that it's not well-defined? We open the path with O_APPEND, > which means every write() will be atomically positioned at the end of > file. So we would never lose or overwrite data. > > We do our own buffering in a strbuf, writing the result out in a single > write() call (modulo the OS returning a short write, but that should not > generally happen when writing short strings to a file). So we should get > individual trace lines as atomic units. > > The order of lines from the two processes is undefined, of course. Correct. But I am more worried about the "mixed/overwriting" breakage, if there is one; it means we may need to be prepared for systems that lack O_APPEND that works correctly. I initially just assumed that it was what Dscho was seeing, but after re-reading his message, I am not sure anymore. I think the "do not trace the other side" approach you suggest for these tests that only care about one side is more appropriate solution for this particular case. We then do not have to worry about overwriting or output from both sides mixed randomly. > diff --git a/t/t5552-skipping-fetch-negotiator.sh > b/t/t5552-skipping-fetch-negotiator.sh > index 3b60bd44e0..5ad5bece55 100755 > --- a/t/t5552-skipping-fetch-negotiator.sh > +++ b/t/t5552-skipping-fetch-negotiator.sh > @@ -28,6 +28,19 @@ have_not_sent () { > done > } > > +# trace_fetch [args] > +# > +# Trace the packet output of fetch, but make sure we disable the variable > +# in the child upload-pack, so we don't combine the results in the same file. > +trace_fetch () { > + client=$1; shift > + server=$1; shift > + GIT_TRACE_PACKET="$(pwd)/trace" \ > + git -C "$client" fetch \ > + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ > + "$server" "$@" > +} > + > test_expect_success 'commits with no parents are sent regardless of skip > distance' ' > git init server && > test_commit -C server to_fetch && > @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent > regardless of skip distanc > # "c1" has no parent, it is still sent as "have" even though it would > # normally be skipped. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c7 c5 c2 c1 && > have_not_sent c6 c4 c3 > ' > @@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the > larger one' ' > # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" > # (from "c5side" skip 1). > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c5side c11 c9 c6 c1 && > have_not_sent c10 c8 c7 c5 c4 c3 c2 > ' > @@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out > commits' ' > # not need to send any ancestors of "c3", but we still need to send "c3" > # itself. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && > + trace_fetch client origin to_fetch && > have_sent c5 c4^ c2side && > have_not_sent c4 c4^^ c4^^^ > ' > @@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' ' > # and sent, because (due to clock skew) its only parent has already been > # popped off the priority queue. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c2 c1 old4 old2 old1 && > have_not_sent old3 > ' > @@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of > commits that server AC > test_commit -C server commit-on-b1 && > > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" > to_fetch && > + trace_fetch client "$(pwd)/server" to_fetch && > grep " fetch" trace && > > # fetch-pack sends 2 requests each containing 16 "have" lines before
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Thu, Aug 09, 2018 at 10:35:24AM -0700, Johannes Schindelin via GitGitGadget wrote: > The culprit is that two processes try simultaneously to write to the same > file specified via GIT_TRACE_PACKET, and it is not well defined how that > should work, even on Linux. Are you sure that it's not well-defined? We open the path with O_APPEND, which means every write() will be atomically positioned at the end of file. So we would never lose or overwrite data. We do our own buffering in a strbuf, writing the result out in a single write() call (modulo the OS returning a short write, but that should not generally happen when writing short strings to a file). So we should get individual trace lines as atomic units. The order of lines from the two processes is undefined, of course. > This patch series addresses that by locking the trace fd. I chose to use > flock() instead of fcntl() because the Win32 API LockFileEx() > [https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-lockfileex] > > (which does exactly what I want in this context) has much more similar > semantics to the former than the latter. I must admit I'm not excited to see us adding extra locking and complexity when it is not necessarily needed. Is this a feature we actually care about delivering for normal users of GIT_TRACE, or do we just care about solving the test flakiness here? Because we should be able to do the latter with the patch below. I also wonder if we should be clearing GIT_TRACE as part of clearing repo-env. It would do what we want automatically in this case, though there are definitely cases where I prefer the current behavior (because I really do want to see the upload-pack side of it). Of course I could always use "--upload-pack 'GIT_TRACE_PACKET=... upload-pack'", so this is really just a default. diff --git a/t/t5552-skipping-fetch-negotiator.sh b/t/t5552-skipping-fetch-negotiator.sh index 3b60bd44e0..5ad5bece55 100755 --- a/t/t5552-skipping-fetch-negotiator.sh +++ b/t/t5552-skipping-fetch-negotiator.sh @@ -28,6 +28,19 @@ have_not_sent () { done } +# trace_fetch [args] +# +# Trace the packet output of fetch, but make sure we disable the variable +# in the child upload-pack, so we don't combine the results in the same file. +trace_fetch () { + client=$1; shift + server=$1; shift + GIT_TRACE_PACKET="$(pwd)/trace" \ + git -C "$client" fetch \ + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ + "$server" "$@" +} + test_expect_success 'commits with no parents are sent regardless of skip distance' ' git init server && test_commit -C server to_fetch && @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent regardless of skip distanc # "c1" has no parent, it is still sent as "have" even though it would # normally be skipped. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c7 c5 c2 c1 && have_not_sent c6 c4 c3 ' @@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the larger one' ' # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" # (from "c5side" skip 1). test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c5side c11 c9 c6 c1 && have_not_sent c10 c8 c7 c5 c4 c3 c2 ' @@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out commits' ' # not need to send any ancestors of "c3", but we still need to send "c3" # itself. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && + trace_fetch client origin to_fetch && have_sent c5 c4^ c2side && have_not_sent c4 c4^^ c4^^^ ' @@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' ' # and sent, because (due to clock skew) its only parent has already been # popped off the priority queue. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c2 c1 old4 old2 old1 && have_not_sent old3 ' @@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of commits that server AC test_commit -C server commit-on-b1 && test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" to_fetch && + trace_fetch client "$(pwd)/server" to_fetch && grep " fetch" trace && # fetch-pack sends 2 requests each containing 16 "have"
[PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. This patch series addresses that by locking the trace fd. I chose to use flock() instead of fcntl() because the Win32 API LockFileEx() [https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-lockfileex] (which does exactly what I want in this context) has much more similar semantics to the former than the latter. Of course, I have to admit that I am not super solid on flock() semantics, and I also do not know which conditional blocks in config.mak.uname should grow a HAVE_FLOCK = YesWeDo line, still. Reviewers knowledgeable in flock() semantics: I would be much indebted if you helped me there. Also: is it safe to call flock() on file descriptors referring not to files, but, say, pipes or an interactive terminal? Johannes Schindelin (4): Introduce a function to lock/unlock file descriptors when appending mingw: implement lock_or_unlock_fd_for_appending() trace: lock the trace file to avoid racy trace_write() calls trace: verify that locking works Makefile | 1 + compat/mingw.c | 19 ++ compat/mingw.h | 3 + config.mak.uname | 3 + git-compat-util.h | 2 + t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/helper/test-trace.c | 130 + t/t0070-fundamental.sh | 6 ++ trace.c| 11 +++- wrapper.c | 14 + 11 files changed, 190 insertions(+), 1 deletion(-) create mode 100644 t/helper/test-trace.c base-commit: 42cc7485a2ec49ecc440c921d2eb0cae4da80549 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-17%2Fdscho%2Ffetch-negotiator-skipping-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-17/dscho/fetch-negotiator-skipping-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/17 -- gitgitgadget
Re: [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges
On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget wrote: > Now, I am fairly certain that the changes are correct, but given my track > record with off-by-one bugs (and once even an off-by-two bug), I would > really appreciate some thorough review of this code, in particular the > second one that is the actual bug fix. I am specifically interested in > reviews from people who know line-log.c pretty well and can tell me whether > the src[i].end > target[j].end is correct, or whether it should actually > have been a >= (I tried to wrap my head around this, but I would feel more > comfortable if a domain expert would analyze this, whistling, and looking > Eric's way). Although I fixed a number of bugs in basic range-set manipulation code (5 years ago), I have no experience with or knowledge of the code actually dealing with range-set manipulation in relation to diffs, so I'm not a domain expert by any means, and I'm _still_ trying to wrap my head around that code. That said, I left some comments on the individual patches suggesting a simpler and more correct fix for the crash. (Despite that suggestion for fixing the crash, I still can't say whether the actual computations performed by the existing code are correct, since I still don't understand that aspect of it.)
[PATCH 0/4] line-log: be more careful when adjusting multiple line ranges
I am a heavy user of git log -L In fact, I use the feature where multiple ranges can be specified extensively, via a not-exactly-trivial shell script function that takes the currently staged changes (or if none are staged, the current unstanged changes) and turns them into the corresponding commit history: staged_log () { # diff="$(git diff --cached -U1)" test -n "$diff" || diff="$(git diff -U1)" test -n "$diff" || die "No changes" eval "git log $(echo "$diff" | sed -ne '/^--- a\//{s/^-* a\/\(.*\)/'\''\1'\''/;x}' -e \ '/^@@ -/{s/^@@ -\([^, ]*\),\([^ ]*\).*/-L \1,+\2/;G;s/\n/:/g;p}' | tr '\n' ' ')" } This is an extremely useful way to look at the history, especially when trying to fix up a commit deep in a long branch (or a thicket of branches). Today, however, this method failed me, by greeting me with an assertion. When I tried to paper over that assertion by joining line ranges that became adjacent (or overlapping), it still produced a segmentation fault when the line-log tried to print lines past the file contents. So I had no choice but to fix this properly. I still wanted to keep the optimization where multiple line ranges are joined into a single one (I am convinced that this also affects the output, where previously multiple hunks would have been displayed, but I ran out of time to investigate this). This is the 3rd patch. It is not purely an optimization, as the assertion would still trigger when line ranges could be joined. Now, I am fairly certain that the changes are correct, but given my track record with off-by-one bugs (and once even an off-by-two bug), I would really appreciate some thorough review of this code, in particular the second one that is the actual bug fix. I am specifically interested in reviews from people who know line-log.c pretty well and can tell me whether the src[i].end > target[j].end is correct, or whether it should actually have been a >= (I tried to wrap my head around this, but I would feel more comfortable if a domain expert would analyze this, whistling, and looking Eric's way). Cc: Eric Sunshine sunsh...@sunshineco.com [sunsh...@sunshineco.com] Johannes Schindelin (4): line-log: demonstrate a bug with nearly-overlapping ranges line-log: adjust start/end of ranges individually line-log: optimize ranges by joining them when possible line-log: convert an assertion to a full BUG() call line-log.c | 18 +++--- t/t4211-line-log.sh | 17 + 2 files changed, 32 insertions(+), 3 deletions(-) base-commit: 1d89318c48d233d52f1db230cf622935ac3c69fa Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-15%2Fdscho%2Fline-log-fix-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-15/dscho/line-log-fix-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/15 -- gitgitgadget
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Johannes Schindelin writes: > Hi Olga, > > On Fri, 13 Jul 2018, Оля Тележная wrote: > >> 2018-07-09 11:27 GMT+03:00 Оля Тележная : >> > Hello everyone, >> > This is my new attempt to start using oid_object_info_extended() in >> > ref-filter. You could look at previous one [1] [2] but it is not >> > necessary. >> > >> > The goal (still) is to improve performance by avoiding calling expensive >> > functions when we don't need the information they provide >> > or when we could get it by using a cheaper function. >> > >> > This patch is a middle step. In the end, I want to add new atoms >> > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in >> > cat-file command. >> > >> > I also know about problems with memory leaks in ref-filter: that would >> > be my next task that I will work on. Since I did not generate any new >> > leaks in this patch (just use existing ones), I decided to put this >> > part on a review and fix leaks as a separate task. >> >> UPDATES since v1: >> add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin) >> improve second commit message (thanks to Junio C Hamano) >> add static keyword (thanks to Ramsay Jones) >> >> > >> > Thank you! >> > >> > [1] https://github.com/git/git/pull/493 > > Could you please populate the description of that PR so that SubmitGit > picks it up as cover letter? Thanks for suggesting that. Yes, an updated version of a series, even if it is a small one with just 4 or 5 patches, becomes much easier to read with a well-written cover letter.
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Hi Olga, On Fri, 13 Jul 2018, Оля Тележная wrote: > 2018-07-09 11:27 GMT+03:00 Оля Тележная : > > Hello everyone, > > This is my new attempt to start using oid_object_info_extended() in > > ref-filter. You could look at previous one [1] [2] but it is not > > necessary. > > > > The goal (still) is to improve performance by avoiding calling expensive > > functions when we don't need the information they provide > > or when we could get it by using a cheaper function. > > > > This patch is a middle step. In the end, I want to add new atoms > > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in > > cat-file command. > > > > I also know about problems with memory leaks in ref-filter: that would > > be my next task that I will work on. Since I did not generate any new > > leaks in this patch (just use existing ones), I decided to put this > > part on a review and fix leaks as a separate task. > > UPDATES since v1: > add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin) > improve second commit message (thanks to Junio C Hamano) > add static keyword (thanks to Ramsay Jones) > > > > > Thank you! > > > > [1] https://github.com/git/git/pull/493 Could you please populate the description of that PR so that SubmitGit picks it up as cover letter? Thanks, Johannes > > [2] > > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/ >
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
2018-07-09 11:27 GMT+03:00 Оля Тележная : > Hello everyone, > This is my new attempt to start using oid_object_info_extended() in > ref-filter. You could look at previous one [1] [2] but it is not > necessary. > > The goal (still) is to improve performance by avoiding calling expensive > functions when we don't need the information they provide > or when we could get it by using a cheaper function. > > This patch is a middle step. In the end, I want to add new atoms > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in > cat-file command. > > I also know about problems with memory leaks in ref-filter: that would > be my next task that I will work on. Since I did not generate any new > leaks in this patch (just use existing ones), I decided to put this > part on a review and fix leaks as a separate task. UPDATES since v1: add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin) improve second commit message (thanks to Junio C Hamano) add static keyword (thanks to Ramsay Jones) > > Thank you! > > [1] https://github.com/git/git/pull/493 > [2] > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/
[PATCH 0/4] test-lint: detect one-shot "FOO=bar shell_func"
One-shot "FOO=bar cmd" environment variable assignments exist only during invocation of 'cmd'. However, if 'cmd' is a shell function, then the variable is assigned in the running shell and exists until the process exits (or is unset explicitly). Such a side-effect is almost certainly unintended by a test author and is likely due to lack of familiarity with the problem. Upgrade "make test-lint" to detect this sort of suspect usage. Also fix a couple instances of "FOO=bar shell_func" detected by the improved linting. This series is built atop 'jc/t3404-one-shot-export-fix'[1]. [1]: https://public-inbox.org/git/xmqqefg8w73c@gitster-ct.c.googlers.com/T/ Eric Sunshine (4): t6046/t9833: fix use of "VAR=VAL cmd" with a shell function t/check-non-portable-shell: stop being so polite t/check-non-portable-shell: make error messages more compact t/check-non-portable-shell: detect "FOO=bar shell_func" t/check-non-portable-shell.pl | 31 +- t/t6046-merge-skip-unneeded-updates.sh | 4 +++- t/t9833-errors.sh | 4 +++- 3 files changed, 32 insertions(+), 7 deletions(-) -- 2.18.0.233.g985f88cf7e
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Hi Olga, On Mon, 9 Jul 2018, Оля Тележная wrote: > [2] > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/ This type of Message-Id makes me think that you used SubmitGit to send this patch series. The main problem I see here is that the patches are not sent as replies to this cover letter, and therefore they are seemingly disconnected on the mailing list. It was also my impression that SubmitGit started supporting sending cover letters, in which case you would not have to jump through hoops to thread the mails properly. But for that to work, the PR has to have a description which is then used as cover letter. I do not see any description in https://github.com/git/git/pull/520, though. Maybe provide one? Ciao, Johannes P.S.: You might have noticed that I am working (slowly, but steadily) on a contender for SubmitGit that I call GitGitGadget. Originally, I really wanted to enhance SubmitGit instead because I am a big believer of *not* reinventing the wheel (so much energy gets wasted that way). However, in this case the limitations of the chosen language (I do not want to learn Scala, I have absolutely zero need to know Scala in any of my other endeavors, and my time to learn new things is limited, so I spend it wisely) and the limitations of the design (the UI is completely separate from GitHub, you have to allow Amazon to send mails in your name, and SubmitGit's design makes it impossible to work bi-directionally, it is only GitHub -> mailing list, while I also want the option to add replies on the mailing list as comments to the GitHub PR in the future) made me reconsider. If you want to kick the tires, so to say, I welcome you to give GitGitGadget a try. It would require only a couple of things from you: - You would have to settle for a branch name, and then not open new PRs for every iteration you want to send, but force-push the branch instead. - You would have to open a PR at https://github.com/gitgitgadget/git. - You would have to provide the cover letter via the PR's description (and update that description before sending newer iterations). - I would have to add you to the list of users allowed to send patches via GitGitGadget (GitGitGadget has some really light-weight access control to prevent spamming). - You would then send a new iteration by simply adding a comment to your PR that contains this command: /submit - To integrate well with previous patch series iterations (i.e. to connect the threads), I would have to come up with a little bit of tooling to add some metadata that I have to reconstruct manually from your previously-sent iterations.
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Оля Тележная writes: > Hello everyone, > This is my new attempt to start using oid_object_info_extended() in > ref-filter. You could look at previous one [1] [2] but it is not > necessary. Yup, it sounds like a sensible thing to do to try asking object-info helper instead of reading the whole object in-core and inspecting it ourselves when we can avoid it.
[PATCH 0/4] Use oid_object_info() instead of read_object_file()
Hello everyone, This is my new attempt to start using oid_object_info_extended() in ref-filter. You could look at previous one [1] [2] but it is not necessary. The goal (still) is to improve performance by avoiding calling expensive functions when we don't need the information they provide or when we could get it by using a cheaper function. This patch is a middle step. In the end, I want to add new atoms ("objectsize:disk" and "deltabase") and reuse ref-filter logic in cat-file command. I also know about problems with memory leaks in ref-filter: that would be my next task that I will work on. Since I did not generate any new leaks in this patch (just use existing ones), I decided to put this part on a review and fix leaks as a separate task. Thank you! [1] https://github.com/git/git/pull/493 [2] https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/
Re: [RFC PATCH 0/4] Fix occasional test failures in http tests
On Thu, Jun 14, 2018 at 02:31:03PM +0200, SZEDER Gábor wrote: > 't5561-http-backend.sh' is prone to occasional failures; luckily it's > not 'git-http-backend's fault, but the test script is a bit racy. I > won't go into the details here, patch 4/4's commit message discusses > it at length. 4/4 also fixes this issue, but I'm not particularly > happy with that fix, the note attached to that patch explains why, > along with possible alternatives, hence the RFC. > > Even if we settle for a different fix, I think the first two patches > are worthy cleanups on their own right. Yes, the first two look like improvements on their own (and I left more detailed comments on the final one). Thanks for tackling this! -Peff
Re: [RFC PATCH 0/4] Fix occasional test failures in http tests
SZEDER Gábor writes: > 't5561-http-backend.sh' is prone to occasional failures; luckily it's > not 'git-http-backend's fault, but the test script is a bit racy. I > won't go into the details here, patch 4/4's commit message discusses > it at length. 4/4 also fixes this issue, but I'm not particularly > happy with that fix, the note attached to that patch explains why, > along with possible alternatives, hence the RFC. Interesting. > Even if we settle for a different fix, I think the first two patches > are worthy cleanups on their own right. Yeah, the first two are unquestionally good. The sed expression of the second one could use sq around (instead of dq) to lose hard-to-read backslash quoting from there, but it probably is OK to keep the second one a pure "refactoring" patch, and leave it to a separate "further cleanup" patch to make such a change. > SZEDER Gábor (4): > t5541: avoid empty line when truncating access log > t/lib-httpd: add the strip_access_log() helper function > t/lib-httpd: add minor and patch version to $HTTPD_VERSION > t/lib-httpd: sort log based on timestamp to avoid occasional failure > > t/lib-httpd.sh | 24 ++-- > t/lib-httpd/apache.conf | 5 + > t/t5541-http-push-smart.sh | 17 +++-- > t/t5551-http-fetch-smart.sh | 7 +-- > t/t5561-http-backend.sh | 7 +-- > 5 files changed, 32 insertions(+), 28 deletions(-)
[RFC PATCH 0/4] Fix occasional test failures in http tests
't5561-http-backend.sh' is prone to occasional failures; luckily it's not 'git-http-backend's fault, but the test script is a bit racy. I won't go into the details here, patch 4/4's commit message discusses it at length. 4/4 also fixes this issue, but I'm not particularly happy with that fix, the note attached to that patch explains why, along with possible alternatives, hence the RFC. Even if we settle for a different fix, I think the first two patches are worthy cleanups on their own right. SZEDER Gábor (4): t5541: avoid empty line when truncating access log t/lib-httpd: add the strip_access_log() helper function t/lib-httpd: add minor and patch version to $HTTPD_VERSION t/lib-httpd: sort log based on timestamp to avoid occasional failure t/lib-httpd.sh | 24 ++-- t/lib-httpd/apache.conf | 5 + t/t5541-http-push-smart.sh | 17 +++-- t/t5551-http-fetch-smart.sh | 7 +-- t/t5561-http-backend.sh | 7 +-- 5 files changed, 32 insertions(+), 28 deletions(-) -- 2.18.0.rc0.207.ga6211da864
[PATCH 0/4] color.ui docs & add color.ui=isatty
I started writing this for the reasons explained in 4/4, but the justification there is a bit of a stretch. Where we'd realistically like to diverge color.ui=auto and color.pager=false, but since I wrote it already maybe some people will come out of the woodworks telling me that this is what they've always wanted for whatever reason. Maybe if we don't like 4/4 we should take 3/4 because we might want another such option in the future, but that's probably overly hedging our bets, still. I really don't like this pattern that we have a multi-value config option that dies instead of warns on unknown future values, so I'm leaning towards saying that should be accepted to git. But while I was working towards 4/4 I did some nice fixes in [12]/4. I think those should go in regardless, so they're non-RFC. Ævar Arnfjörð Bjarmason (4): config doc: move color.ui documentation to one place config doc: clarify "to a terminal" in color.ui color.ui config: don't die on unknown values color.ui config: add "isatty" setting Documentation/config.txt | 100 +-- color.c | 25 -- color.h | 1 + t/t7006-pager.sh | 31 4 files changed, 107 insertions(+), 50 deletions(-) -- 2.17.0.290.gded63e768a
Re: [PATCH 0/4] fsck: doc fixes & fetch.fsck.* implementation
Ævar Arnfjörð Bjarmasonwrites: > On Thu, May 24, 2018 at 9:02 PM, Jeff King wrote: >> On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> That doesn't work, because that's for the server-side, but I need the >>> fetch.fsck.* that doesn't exist. This works (I'll send a better patch >>> with tests / docs etc. soon): >> >> Yeah, I think this is the right direction. It seems that this 4-patch series is sufficiently commented and earlier parts can be further polished soonish using help made by others. I need to re-read the last patch (implementation) but I think this generally is good. Thanks.
[PATCH 0/4] fsck: doc fixes & fetch.fsck.* implementation
On Thu, May 24, 2018 at 9:02 PM, Jeff Kingwrote: > On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> That doesn't work, because that's for the server-side, but I need the >> fetch.fsck.* that doesn't exist. This works (I'll send a better patch >> with tests / docs etc. soon): > > Yeah, I think this is the right direction. > >> +static int fetch_pack_config_cb(const char *var, const char *value, void >> *cb) >> +{ >> + if (strcmp(var, "fetch.fsck.skiplist") == 0) { >> + const char *path; >> + >> + if (git_config_pathname(, var, value)) >> + return 1; >> + strbuf_addf(_msg_types, "%cskiplist=%s", >> + fsck_msg_types.len ? ',' : '=', path); >> + free((char *)path); >> + return 0; >> + } >> + >> + if (skip_prefix(var, "fetch.fsck.", )) { >> + if (is_valid_msg_type(var, value)) >> + strbuf_addf(_msg_types, "%c%s=%s", >> + fsck_msg_types.len ? ',' : '=', var, value); >> + else >> + warning("Skipping unknown msg id '%s'", var); >> + return 0; >> + } > > This matches what's in receive-pack, though the way we stuff all of the > options into a flat string is kind of nasty. I also wonder if we'd > eventually run up against command-line limits if somebody had a > complicated fsck config. Yeah, but I'm leaving this for the future. I doubt that it's going to happen in practice, although if you have a huge skip-list... > I wonder if we should simply be telling index-pack "please read fsck > config from the prefix 'fetch.fsck'" instead. I think this whole notion of reading the same config from two places sucks, and now with my patches it's three. But I can't think of a reasonable way to make it better without even more complexity, and maybe it's better to split it up anyway, i.e. the stuff you want to accept is different that fsck and fetch. > I dunno, maybe I am just creating work. Certainly I don't think it > should be a blocker for adding fetch.fsck support. But if you want to > think about it while you are here or write a patch, I wouldn't complain. :) Sorry, too late. I already wrote all of this :) Ævar Arnfjörð Bjarmason (4): config doc: don't describe *.fetchObjects twice config doc: unify the description of fsck.* and receive.fsck.* config doc: elaborate on what transfer.fsckObjects does fetch: implement fetch.fsck.* Documentation/config.txt| 113 fetch-pack.c| 32 - t/t5504-fetch-receive-strict.sh | 46 + 3 files changed, 148 insertions(+), 43 deletions(-) -- 2.17.0.290.gded63e768a
Re: [GSoC][PATCH 0/4] rebase: split rebase -p from rebase -i
On Tue, May 22, 2018 at 6:31 AM, Alban Gruinwrote: > This splits the `rebase --preserve-merges` functionnality from > git-rebase--interactive.sh. This is part of the effort to depreciate > preserve-merges. The new script, git-rebase--preserve-merges.sh, should be > left > to bitrot. All the dead code left by the duplication of > git-rebase--interactive.sh is also removed. ... and I thought the original motivation was getting the rest of rebase into a shape that rewriting it is easier, the potential bit rot of --preserve-merges is rather a side effect, but not the main goal. I commented on patch 1, as I don't quite understand the changes, the other patches look good to me. Thanks, Stefan
[GSoC][PATCH 0/4] rebase: split rebase -p from rebase -i
This splits the `rebase --preserve-merges` functionnality from git-rebase--interactive.sh. This is part of the effort to depreciate preserve-merges. The new script, git-rebase--preserve-merges.sh, should be left to bitrot. All the dead code left by the duplication of git-rebase--interactive.sh is also removed. This patch series is based of js/sequencer-and-root-commits. Alban Gruin (4): rebase: duplicate git-rebase--interactive.sh to git-rebase--preserve-merges.sh rebase: strip unused code in git-rebase--preserve-merges.sh rebase: use the new git-rebase--preserve-merges.sh rebase: remove -p code from git-rebase--interactive.sh .gitignore |1 + Makefile |2 + git-rebase--interactive.sh | 708 +--- git-rebase--preserve-merges.sh | 1004 git-rebase.sh | 32 +- 5 files changed, 1040 insertions(+), 707 deletions(-) create mode 100644 git-rebase--preserve-merges.sh -- 2.16.1
Re: [PATCH 0/4] a few mark_parents_uninteresting cleanups
On 5/11/2018 2:00 PM, Jeff King wrote: This is a follow-up to the discussion from February: https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/ There I theorized that some of these extra has_object_file() checks were unnecessary. After poking around a bit, I've convinced myself that this is the case, so here are some patches. After Stolee's fix in ebbed3ba04 (revision.c: reduce object database queries, 2018-02-24), I doubt this will provide any noticeable speedup. IMHO the value is just in simplifying the logic. The first two patches are the actual has_object_file simplifications. The second two are an attempt to fix some head-scratching I had while reading mark_parents_uninteresting(). I hope the result is easier to follow, but I may have just shuffled one confusion for another. :) [1/4]: mark_tree_contents_uninteresting(): drop missing object check [2/4]: mark_parents_uninteresting(): drop missing object check [3/4]: mark_parents_uninteresting(): replace list with stack [4/4]: mark_parents_uninteresting(): avoid most allocation revision.c | 90 ++ 1 file changed, 50 insertions(+), 40 deletions(-) -Peff This series looks good to me. I found Patch 3 hard to read in the diff, so I just looked at the final result and the new arrangement is very clear about how it should behave. Reviewed-by: Derrick Stolee
[PATCH 0/4] a few mark_parents_uninteresting cleanups
This is a follow-up to the discussion from February: https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/ There I theorized that some of these extra has_object_file() checks were unnecessary. After poking around a bit, I've convinced myself that this is the case, so here are some patches. After Stolee's fix in ebbed3ba04 (revision.c: reduce object database queries, 2018-02-24), I doubt this will provide any noticeable speedup. IMHO the value is just in simplifying the logic. The first two patches are the actual has_object_file simplifications. The second two are an attempt to fix some head-scratching I had while reading mark_parents_uninteresting(). I hope the result is easier to follow, but I may have just shuffled one confusion for another. :) [1/4]: mark_tree_contents_uninteresting(): drop missing object check [2/4]: mark_parents_uninteresting(): drop missing object check [3/4]: mark_parents_uninteresting(): replace list with stack [4/4]: mark_parents_uninteresting(): avoid most allocation revision.c | 90 ++ 1 file changed, 50 insertions(+), 40 deletions(-) -Peff
Re: [PATCH 0/4] doc: cleaning up instances of \--
On Wed, Apr 18, 2018 at 01:24:55PM +0900, Junio C Hamano wrote: > Martin Ågrenwrites: > > > This is a patch series to convert \-- to -- in our documentation. The > > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to > > --option, 2015-05-13) to fix some instances that have appeared since. > > The other three patches deal with standalone "\--" which we can't > > always turn into "--" since it can be rendered as an em dash. > > All looked sensible. As you mentioned in [2/4], "\--::" that is > part of an enumulation appear in documentation for about a dozen > commands after the series, but I do not think we can avoid it. > > One thing that makes me wonder related to these patches is if a > newer AsciiDoc we assume lets us do without {litdd} macro. This > series and our earlier effort like 1c262bb7 ("doc: convert \--option > to --option", 2015-05-13) mentions that "\--word" is less pleasant > on the eyes than "--word", but the ugliness "two{litdd}words" has > over "two--words" is far worse than that, so... I think many cases that use {litdd} would be better off using literal backticks anyway (e.g., git-add.txt mentions the filename `git-add--interactive.perl`). There are certainly a few that can't, though (e.g., config.txt uses linkgit:git-web{litdd}browse[1]). I agree that "\--" is less ugly there (and seems to work on my modern asciidoc). There's some history on the litdd versus "\--" choice in 565e135a1e (Documentation: quote double-dash for AsciiDoc, 2011-06-29). That in turn references the 2839478774 (Work around em-dash handling in newer AsciiDoc, 2010-08-23), but I wouldn't be surprised if all of that is now obsolete with our AsciiDoc 8+ requirement. -Peff PS Late review, I know, but the patches look good to me. :)
Re: [PATCH 0/4] subtree: move out of contrib
On Tue, May 01 2018, Johannes Schindelin wrote: > Hi Ævar, > > On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote: > >> I think at this point git-subtree is widely used enough to move out of >> contrib/, maybe others disagree, but patches are always better for >> discussion that patch-less ML posts. > > Sure, it is used widely enough. > > However, it flies in the face of so many GSoC efforts to introduce yet > another one of those poorly portable Unix shell scripts, as central part > of Git's code base. > > The script itself does look quite straight-forward to port to a builtin, > so why not give it a try? That's a valid point. I think it makes sense to leave that aside for now, maybe the consensus is that subtree is fine in every way except we'd like to have a policy not to introduce new shellscript built-ins. Let's first just assume it's in C already and look at it in terms of its functionality, to figure out if it's worth even getting to that point. > If you are completely opposed to porting it to C, I will be completely > opposed to moving it out of contrib/. This series shows that we should split the concern about whether something lives in contrib/ from whether it's built/installed by default. No matter if we decide that subtree should be a blessed default command it makes sense to move it out of contrib, purely because as can be seen from this series it'll replace >100 lines of hacks with 1 line in our main Makefile. We can then just e.g. add a flag to guard for it, e.g. CONTRIB_SUBTREE=YesPlease. But that's just an internal implementation detail of how we manage code sitting in git.git.
Re: [PATCH 0/4] subtree: move out of contrib
Hi Ævar, On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote: > I think at this point git-subtree is widely used enough to move out of > contrib/, maybe others disagree, but patches are always better for > discussion that patch-less ML posts. Sure, it is used widely enough. However, it flies in the face of so many GSoC efforts to introduce yet another one of those poorly portable Unix shell scripts, as central part of Git's code base. The script itself does look quite straight-forward to port to a builtin, so why not give it a try? If you are completely opposed to porting it to C, I will be completely opposed to moving it out of contrib/. If you need help with porting it, please come up with a task plan and I can jump in, to help (but please do collaborate with me on this one, don't leave all of the hard work to me). Ciao, Dscho
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 11:50 AM, Ævar Arnfjörð Bjarmasonwrote: > I think at this point git-subtree is widely used enough to move out of > contrib/, maybe others disagree, but patches are always better for > discussion that patch-less ML posts. After narrow/partial clone becomes real, it should be "easy" to implement some sort of narrow checkout that would achieve the same thing. But it took me forever with all other stuff to get back to this. If we remove it from contrib and there are people willing to update/maintain it, should it be a separate repository then? The willing people will have much more freedom to update it. And I don't have to answer the questions about who will maintain this thing in git.git. I don't like the idea of adding new (official) shell-based commands either to be honest. -- Duy
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 6:21 PM, Ævar Arnfjörð Bjarmasonwrote: > Pretty clear it's garbage data, unless we're to believe that the > relative interest of submodules in the US, Germany and Sweden is 51, 64 > & 84, but 75, 100 and 0 for subtree. Oh yeah, Swedish people hate git-subtree. Nobody knows why. Avery
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30 2018, Avery Pennarun wrote: > On Mon, Apr 30, 2018 at 5:38 PM, Stefan Bellerwrote: > There's one exception, which is doing a one-time permanent merge of > two projects into one. That's a nice feature, but is probably used > extremely rarely. FWIW this is the only thing I've used it for. I do this occasionally and used to do this manually with format-patch + "perl -pe" before or similar when I needed to merge some repositories together, and then some other times I was less stupid and manually started doing something similar to what subtree is doing with a "move everything" commit just before the merge of the two histories. >> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule >> >> Not sure what to make of this data. > > Clearly people need a lot more help when using submodules than when > using subtree :) Pretty clear it's garbage data, unless we're to believe that the relative interest of submodules in the US, Germany and Sweden is 51, 64 & 84, but 75, 100 and 0 for subtree.
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 2:53 PM, Avery Pennarunwrote: > For the best of both worlds, I've often thought that a good balance > would be to use the same data structure that submodule uses, but to > store all the code in a single git repo under different refs, which we > might or might not download (or might or might not have different > ACLs) under different circumstances. There has been some experimentation with having a simpler ref surface on the submodule side, https://public-inbox.org/git/cover.1512168087.git.jonathanta...@google.com/ The way you describe the future of submodules, all we'd have to do is to teach git-clone how to select the the "interesting" refs for your use case. Any other command would assume all submodule data to be in the main repository. The difference to Jonathans proposal linked above, would be the object store to be in the main repo and the refs to be prefixed per submodule instead of "shadowed". > However, when some projects get > really huge (lots of very big submodule dependencies), then repacking > one-big-repo starts becoming unwieldy; in that situation git-subtree > also fails completely. Yes, but that is a general scaling problem of Git that could be tackled, e.g. repack into multiple packs serially instead of putting everything into one pack. >> Submodules do not need to produce a synthetic project history >> when splitting off again, as the history is genuine. This allows >> for easier work with upstream. > > Splitting for easier work upstream is great, and there really ought to > be an official version of 'git subtree split', which is good for all > sorts of purposes. > > However, I suspect almost all uses of the split feature are a) > splitting a subtree that you previously merged in, or b) splitting a > subtree into a separate project that you want to maintain separately > from now on. Repeated splits in case (a) are only necessary because > you're not using submodules, or in case (b) are only necessary because > you didn't *switch* to submodules when it finally came time to split > the projects. (In both cases you probably didn't switch to submodules > because you didn't like one of its tradeoffs, especially the need to > track multiple repos when you fork.) That makes sense. > > There's one exception, which is doing a one-time permanent merge of > two projects into one. That's a nice feature, but is probably used > extremely rarely. More often people get into a > merge-split-merge-split cycle that would be better served by a > slightly improved git-submodule. This rare use case is how git-subtree came into existence in gits contrib directory AFAICT, https://kernel.googlesource.com/pub/scm/git/git/+/634392b26275fe5436c0ea131bc89b46476aa4ae which is interesting to view in git-show, but I think defaults could be tweaked there, as it currently shows me mostly a license file. >> Conceptually Gerrit is doing >> >> while true: >> git submodule update --remote >> if worktree is dirty: >> git commit "update the submodules" >> >> just that Gerrit doesn't poll but does it event based. > > ...and it's super handy :) The problem is it's fundamentally > centralized: because gerrit can serialize merges into the submodule, > it also knows exactly how to update the link in the supermodule. If > there was wild branching and merging (as there often is in git) and > you had to resolve conflicts between two submodules, I don't think it > would be obvious at all how to do it automatically when pushing a > submodule. (This also works quite badly with git subtree --squash.) With the poll based solution I don't think you'd run into many more problems than you would with Gerrits solution. In a nearby thread, we were just discussing the submodule merging strategies, https://public-inbox.org/git/1524739599.20251.17.ca...@klsmartin.com/ which might seem confusing, but the implementation is actually easy as we just fastforward-only in submodules. >> >> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule >> >> Not sure what to make of this data. > > Clearly people need a lot more help when using submodules than when > using subtree :) That could be true. :) Thanks, Stefan
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 5:38 PM, Stefan Bellerwrote: > On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarun wrote: > No objections from me either. > > Submodules seem to serve a slightly different purpose, though? I think the purpose is actually the same - it's just the tradeoffs that are difference. Both sets of tradeoffs kind of suck. > With Subtrees the superproject always contains all the code, > even when you squash the subtree histroy when merging it in. > In the submodule world, you may not have access to one of the > submodules. Right. Personally I think it's a disadvantage of subtree that it always contains all the code (what if some people don't want the code for a particular build variant?). However, it's a huge pain that submodules *don't* contain all the code (what if I'm not online right now, or the site supposedly containing the code goes offline, or I want to make my own fork?). For the best of both worlds, I've often thought that a good balance would be to use the same data structure that submodule uses, but to store all the code in a single git repo under different refs, which we might or might not download (or might or might not have different ACLs) under different circumstances. However, when some projects get really huge (lots of very big submodule dependencies), then repacking one-big-repo starts becoming unwieldy; in that situation git-subtree also fails completely. > Submodules do not need to produce a synthetic project history > when splitting off again, as the history is genuine. This allows > for easier work with upstream. Splitting for easier work upstream is great, and there really ought to be an official version of 'git subtree split', which is good for all sorts of purposes. However, I suspect almost all uses of the split feature are a) splitting a subtree that you previously merged in, or b) splitting a subtree into a separate project that you want to maintain separately from now on. Repeated splits in case (a) are only necessary because you're not using submodules, or in case (b) are only necessary because you didn't *switch* to submodules when it finally came time to split the projects. (In both cases you probably didn't switch to submodules because you didn't like one of its tradeoffs, especially the need to track multiple repos when you fork.) > Subtrees present you the whole history by default and the user > needs to be explicit about not wanting to see history from the > subtree, which is the opposite of submodules (though this > may be planned in the future to switch). It turns out that AFAIK, almost everyone prefers 'git subtree --squash', which squashes into a single commit each time you merge, much like git submodule does. I doubt people would cry too much if the full-history feature went away. There's one exception, which is doing a one-time permanent merge of two projects into one. That's a nice feature, but is probably used extremely rarely. More often people get into a merge-split-merge-split cycle that would be better served by a slightly improved git-submodule. >> The gerrit team (eg. Stefan Beller) has been doing some really great >> stuff to make submodules more usable by helping with relative >> submodule links and by auto-updating links in supermodules at the >> right times. Unfortunately doing that requires help from the server >> side, which kind of messes up decentralization and so doesn't solve >> the problem in the general case. > > Conceptually Gerrit is doing > > while true: > git submodule update --remote > if worktree is dirty: > git commit "update the submodules" > > just that Gerrit doesn't poll but does it event based. ...and it's super handy :) The problem is it's fundamentally centralized: because gerrit can serialize merges into the submodule, it also knows exactly how to update the link in the supermodule. If there was wild branching and merging (as there often is in git) and you had to resolve conflicts between two submodules, I don't think it would be obvious at all how to do it automatically when pushing a submodule. (This also works quite badly with git subtree --squash.) >> I really wish there were a good answer, but I don't know what it is. >> I do know that lots of people seem to at least be happy using >> git-subtree, and would be even happier if it were installed >> automatically with git. > > https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule > > Not sure what to make of this data. Clearly people need a lot more help when using submodules than when using subtree :) Have fun, Avery
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarunwrote: > On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmason > wrote: >> I think at this point git-subtree is widely used enough to move out of >> contrib/, maybe others disagree, but patches are always better for >> discussion that patch-less ML posts. > > I really was hoping git-subtree would be completely obsoleted by > git-submodule by now, but... it hasn't been, so... no objections on > this end. No objections from me either. Submodules seem to serve a slightly different purpose, though? With Subtrees the superproject always contains all the code, even when you squash the subtree histroy when merging it in. In the submodule world, you may not have access to one of the submodules. Submodules do not need to produce a synthetic project history when splitting off again, as the history is genuine. This allows for easier work with upstream. Subtrees present you the whole history by default and the user needs to be explicit about not wanting to see history from the subtree, which is the opposite of submodules (though this may be planned in the future to switch). > The gerrit team (eg. Stefan Beller) has been doing some really great > stuff to make submodules more usable by helping with relative > submodule links and by auto-updating links in supermodules at the > right times. Unfortunately doing that requires help from the server > side, which kind of messes up decentralization and so doesn't solve > the problem in the general case. Conceptually Gerrit is doing while true: git submodule update --remote if worktree is dirty: git commit "update the submodules" just that Gerrit doesn't poll but does it event based. > I really wish there were a good answer, but I don't know what it is. > I do know that lots of people seem to at least be happy using > git-subtree, and would be even happier if it were installed > automatically with git. https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule Not sure what to make of this data.
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmasonwrote: > I think at this point git-subtree is widely used enough to move out of > contrib/, maybe others disagree, but patches are always better for > discussion that patch-less ML posts. I really was hoping git-subtree would be completely obsoleted by git-submodule by now, but... it hasn't been, so... no objections on this end. The gerrit team (eg. Stefan Beller) has been doing some really great stuff to make submodules more usable by helping with relative submodule links and by auto-updating links in supermodules at the right times. Unfortunately doing that requires help from the server side, which kind of messes up decentralization and so doesn't solve the problem in the general case. I really wish there were a good answer, but I don't know what it is. I do know that lots of people seem to at least be happy using git-subtree, and would be even happier if it were installed automatically with git. Have fun, Avery
Re: [PATCH 0/4] subtree: move out of contrib
From: "Ævar Arnfjörð Bjarmason"I think at this point git-subtree is widely used enough to move out of contrib/, maybe others disagree, but patches are always better for discussion that patch-less ML posts. Assuming this lands in Git, then there will also need to be a simple follow on into Duy's series that is updating the command-list.txt (Message-Id: <20180429181844.21325-10-pclo...@gmail.com>). Duy's series also does the completions thing IIUC;-). -- Philip Ævar Arnfjörð Bjarmason (4): git-subtree: move from contrib/subtree/ subtree: remove support for git version <1.7 subtree: fix a test failure under GETTEXT_POISON i18n: translate the git-subtree command .gitignore| 1 + Documentation/git-submodule.txt | 2 +- .../subtree => Documentation}/git-subtree.txt | 3 + Makefile | 1 + contrib/subtree/.gitignore| 7 - contrib/subtree/COPYING | 339 -- contrib/subtree/INSTALL | 28 -- contrib/subtree/Makefile | 97 - contrib/subtree/README| 8 - contrib/subtree/t/Makefile| 86 - contrib/subtree/todo | 48 --- .../subtree/git-subtree.sh => git-subtree.sh | 109 +++--- {contrib/subtree/t => t}/t7900-subtree.sh | 21 +- 13 files changed, 78 insertions(+), 672 deletions(-) rename {contrib/subtree => Documentation}/git-subtree.txt (99%) delete mode 100644 contrib/subtree/.gitignore delete mode 100644 contrib/subtree/COPYING delete mode 100644 contrib/subtree/INSTALL delete mode 100644 contrib/subtree/Makefile delete mode 100644 contrib/subtree/README delete mode 100644 contrib/subtree/t/Makefile delete mode 100644 contrib/subtree/todo rename contrib/subtree/git-subtree.sh => git-subtree.sh (84%) rename {contrib/subtree/t => t}/t7900-subtree.sh (99%) -- 2.17.0.290.gded63e768a
[PATCH 0/4] subtree: move out of contrib
I think at this point git-subtree is widely used enough to move out of contrib/, maybe others disagree, but patches are always better for discussion that patch-less ML posts. Ævar Arnfjörð Bjarmason (4): git-subtree: move from contrib/subtree/ subtree: remove support for git version <1.7 subtree: fix a test failure under GETTEXT_POISON i18n: translate the git-subtree command .gitignore| 1 + Documentation/git-submodule.txt | 2 +- .../subtree => Documentation}/git-subtree.txt | 3 + Makefile | 1 + contrib/subtree/.gitignore| 7 - contrib/subtree/COPYING | 339 -- contrib/subtree/INSTALL | 28 -- contrib/subtree/Makefile | 97 - contrib/subtree/README| 8 - contrib/subtree/t/Makefile| 86 - contrib/subtree/todo | 48 --- .../subtree/git-subtree.sh => git-subtree.sh | 109 +++--- {contrib/subtree/t => t}/t7900-subtree.sh | 21 +- 13 files changed, 78 insertions(+), 672 deletions(-) rename {contrib/subtree => Documentation}/git-subtree.txt (99%) delete mode 100644 contrib/subtree/.gitignore delete mode 100644 contrib/subtree/COPYING delete mode 100644 contrib/subtree/INSTALL delete mode 100644 contrib/subtree/Makefile delete mode 100644 contrib/subtree/README delete mode 100644 contrib/subtree/t/Makefile delete mode 100644 contrib/subtree/todo rename contrib/subtree/git-subtree.sh => git-subtree.sh (84%) rename {contrib/subtree/t => t}/t7900-subtree.sh (99%) -- 2.17.0.290.gded63e768a
Re: [PATCH 0/4] doc: cleaning up instances of \--
On Tue, Apr 17, 2018 at 09:15:25PM +0200, Martin Ågren wrote: > This is a patch series to convert \-- to -- in our documentation. The > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to > --option, 2015-05-13) to fix some instances that have appeared since. > The other three patches deal with standalone "\--" which we can't > always turn into "--" since it can be rendered as an em dash. This series looked sane to me as well. I appreciate the work to keep our documentation working in both AsciiDoc and Asciidoctor. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH 0/4] doc: cleaning up instances of \--
Martin Ågrenwrites: > This is a patch series to convert \-- to -- in our documentation. The > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to > --option, 2015-05-13) to fix some instances that have appeared since. > The other three patches deal with standalone "\--" which we can't > always turn into "--" since it can be rendered as an em dash. All looked sensible. As you mentioned in [2/4], "\--::" that is part of an enumulation appear in documentation for about a dozen commands after the series, but I do not think we can avoid it. One thing that makes me wonder related to these patches is if a newer AsciiDoc we assume lets us do without {litdd} macro. This series and our earlier effort like 1c262bb7 ("doc: convert \--option to --option", 2015-05-13) mentions that "\--word" is less pleasant on the eyes than "--word", but the ugliness "two{litdd}words" has over "two--words" is far worse than that, so... Thanks, will queue.
[PATCH 0/4] doc: cleaning up instances of \--
This is a patch series to convert \-- to -- in our documentation. The first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to --option, 2015-05-13) to fix some instances that have appeared since. The other three patches deal with standalone "\--" which we can't always turn into "--" since it can be rendered as an em dash. Martin Martin Ågren (4): doc: convert \--option to --option doc: convert [\--] to [--] git-[short]log.txt: unify quoted standalone -- git-submodule.txt: quote usage in monospace, drop backslash Documentation/git-format-patch.txt | 2 +- Documentation/git-log.txt | 8 Documentation/git-push.txt | 2 +- Documentation/git-shortlog.txt | 6 +++--- Documentation/git-submodule.txt| 4 ++-- Documentation/gitk.txt | 2 +- 6 files changed, 12 insertions(+), 12 deletions(-) -- 2.17.0.252.gfe0a9eaf31
Re: [PATCH 0/4] Convert some stash functionality to a builtin
On 03/24, Joel Teichroeb wrote: > I've been working on converting all of git stash to be a > builtin, however it's hard to get it all working at once with > limited time, so I've moved around half of it to a new > stash--helper builtin and called these functions from the shell > script. Once this is stabalized, it should be easier to convert > the rest of the commands one at a time without breaking > anything. > > I've sent most of this code before, but that was targetting a > full replacement of stash. The code is overall the same, but > with some code review changes and updates for internal api > changes. Thanks for splitting this up into multiple patches, I found that much more pleasant to review, and thanks for your continued work on this :) > Since there seems to be interest from GSOC students who want to > work on converting builtins, I figured I should finish what I > have that works now so they could build on top of it. > > Joel Teichroeb (4): > stash: convert apply to builtin > stash: convert branch to builtin > stash: convert drop and clear to builtin > stash: convert pop to builtin > > .gitignore | 1 + > Makefile| 1 + > builtin.h | 1 + > builtin/stash--helper.c | 514 > > git-stash.sh| 13 +- > git.c | 1 + > 6 files changed, 526 insertions(+), 5 deletions(-) > create mode 100644 builtin/stash--helper.c > > -- > 2.16.2 >
[PATCH 0/4] Convert some stash functionality to a builtin
I've been working on converting all of git stash to be a builtin, however it's hard to get it all working at once with limited time, so I've moved around half of it to a new stash--helper builtin and called these functions from the shell script. Once this is stabalized, it should be easier to convert the rest of the commands one at a time without breaking anything. I've sent most of this code before, but that was targetting a full replacement of stash. The code is overall the same, but with some code review changes and updates for internal api changes. Since there seems to be interest from GSOC students who want to work on converting builtins, I figured I should finish what I have that works now so they could build on top of it. Joel Teichroeb (4): stash: convert apply to builtin stash: convert branch to builtin stash: convert drop and clear to builtin stash: convert pop to builtin .gitignore | 1 + Makefile| 1 + builtin.h | 1 + builtin/stash--helper.c | 514 git-stash.sh| 13 +- git.c | 1 + 6 files changed, 526 insertions(+), 5 deletions(-) create mode 100644 builtin/stash--helper.c -- 2.16.2
Re: [PATCH 0/4] Speed up git tag --contains
On Mon, Mar 12, 2018 at 09:45:27AM -0400, Derrick Stolee wrote: > > diff --git a/builtin/branch.c b/builtin/branch.c > > index 8dcc2ed058..4d674e86d5 100644 > > --- a/builtin/branch.c > > +++ b/builtin/branch.c > > @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, > > struct ref_sorting *sortin > > memset(, 0, sizeof(array)); > > + filter->with_commit_tag_algo = 1; > > filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN); > > if (filter->verbose) > > > > drops my run of "git branch -a --contains HEAD~100" from 8.6s to > > 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with > > a smaller number of branches, we may actually end up slower (because we > > dig further down in history, and don't benefit from the multiple-branch > > speedup). > > It's good to know that we already have an algorithm for the multi-head > approach. Things like `git branch -vv` are harder to tease out because the > graph walk is called by the line-format code. Yeah, the ahead/behind stuff will need some work. Part of it is just code structuring. We know ahead of time which branches (and their upstreams) are going to need this ahead/behind computation, so we should be able to do collect them all for a single call. But I'm not sure if a general multi-pair ahead/behind is going to be easy. I don't have even experimental code for that. :) We have a multi-pair ahead/behind command which we use at GitHub, but it does each pair separately. It leans heavily on reachability bitmaps, so the main advantage is that it's able to amortize the cost of loading the bitmaps (both off disk, but also we sometimes have to walk to complete the bitmaps). -Peff
Re: [PATCH 0/4] Speed up git tag --contains
On 3/3/2018 12:15 AM, Jeff King wrote: On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote: This is a resubmission of Jeff King's patch series to speed up git tag --contains with some changes. It's been cooking for a while as: Replying to this 6-year-old thread: Is there any chance this could be resurrected? We are using phabricator, which uses `git branch --contains` as part of its workflow. Our repo has ~1000 branches on it, and the contains operation is eating up all our CPU (and time). It would be very helpful to us to make this faster! (The original thread is at https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/ Sorry, this got thrown on my "to respond" pile and languished. Thanks for adding me to the thread. It's good to know the pain point people are having around commit graph walks. There are actually three things that make "git branch --contains" slow. First, if you're filtering 1000 branches, we'll run 1000 merge-base traversals, which may walk over the same commits multiple times. These days "tag --contains" uses a different algorithm that can look at all heads in a single traversal. But the downside is that it's depth-first, so it tends to walk down to the roots. That's generally OK for tags, since you often have ancient tags that mean getting close to the roots anyway. But for branches, they're more likely to be recent, and you can get away without going very deep into the history. So it's a tradeoff. There's no run-time switch to flip between them, but a patch like this: diff --git a/builtin/branch.c b/builtin/branch.c index 8dcc2ed058..4d674e86d5 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin memset(, 0, sizeof(array)); + filter->with_commit_tag_algo = 1; filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN); if (filter->verbose) drops my run of "git branch -a --contains HEAD~100" from 8.6s to 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with a smaller number of branches, we may actually end up slower (because we dig further down in history, and don't benefit from the multiple-branch speedup). It's good to know that we already have an algorithm for the multi-head approach. Things like `git branch -vv` are harder to tease out because the graph walk is called by the line-format code. I tried to do a "best of both" algorithm in: https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/ which finds arbitrary numbers of merge bases in a single traversal. It did seem to work, but I felt uneasy about some of the corner cases. I've been meaning to revisit it, but obviously have never gotten around to it. The second slow thing is that during the traversal we load each commit object from disk. The solution there is to keep the parent information in a faster cache. I had a few proposals over the years, but I won't even bother to dig them up, because there's quite recent and promising work in this area from Derrick Stolee: https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/ And finally, the thing that the patches you linked are referencing is about using commit timestamps as a proxy for generation numbers. And Stolee's patches actually leave room for real, trustable generation numbers. Once we have the serialized commit graph and generation numbers, think the final step would just be to teach the "tag --contains" algorithm to stop walking down unproductive lines of history. And in fact, I think we can forget about the best-of-both multi-tip merge-base idea entirely. Because if you can use the generation numbers to avoid going too deep, then a depth-first approach is fine. And we'd just want to flip git-branch over to using that algorithm by default. I'll keep this in mind as a target for performance measurements in the serialized commit graph patch and the following generation number patch. Thanks, -Stolee
Re: [PATCH 0/4] Speed up git tag --contains
} I had a few proposals over the years, but I won't even bother to dig } them up, because there's quite recent and promising work in this } area from Derrick Stolee: It sounds like the best thing to do is to wait for this, then. We managed to convert a bunch of our branches to tags, so our immediate problem has been resolved. But I'm sure it will come up again as more branches are created... carig
Re: [PATCH 0/4] Teach `--default` to `git-config(1)`
On Mon, Mar 05, 2018 at 06:17:25PM -0800, Taylor Blau wrote: > Attached is a patch series to introduce `--default` and `--color` to the > `git-config(1)` builtin with the aim of introducing a consistent interface to > replace `--get-color`. This series draws significant inspiration from Soukaina Nait Hmid's work in: https://public-inbox.org/git/0102015fb0bf2f74-cb456171-fe65-4d83-8784-b553c7c9e584-000...@eu-west-1.amazonses.com/ -- - Taylor
[PATCH 0/4] Teach `--default` to `git-config(1)`
Hi, Attached is a patch series to introduce `--default` and `--color` to the `git-config(1)` builtin with the aim of introducing a consistent interface to replace `--get-color`. Thank you in advance for reviewing this series; I will be more than happy to respond to any feedback seeing as I am still quite new to working on Git itself. Thanks, Taylor Taylor Blau (4): builtin/config: introduce `--default` Documentation: list all type specifiers in config prose config.c: introduce 'git_config_color' to parse ANSI colors builtin/config: introduce `--color` type specifier Documentation/git-config.txt | 21 +--- builtin/config.c | 36 + config.c | 10 config.h | 1 + t/t1300-repo-config.sh | 16 ++ t/t1310-config-default.sh| 119 +++ 6 files changed, 197 insertions(+), 6 deletions(-) create mode 100755 t/t1310-config-default.sh -- 2.15.1.354.g95ec6b1b3
Re: [PATCH 0/4] Speed up git tag --contains
On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote: > > This is a resubmission of Jeff King's patch series to speed up git tag > > --contains with some changes. It's been cooking for a while as: > > Replying to this 6-year-old thread: > > Is there any chance this could be resurrected? We are using > phabricator, which uses `git branch --contains` as part of its > workflow. Our repo has ~1000 branches on it, and the contains > operation is eating up all our CPU (and time). It would be very > helpful to us to make this faster! > > (The original thread is at > https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/ Sorry, this got thrown on my "to respond" pile and languished. There are actually three things that make "git branch --contains" slow. First, if you're filtering 1000 branches, we'll run 1000 merge-base traversals, which may walk over the same commits multiple times. These days "tag --contains" uses a different algorithm that can look at all heads in a single traversal. But the downside is that it's depth-first, so it tends to walk down to the roots. That's generally OK for tags, since you often have ancient tags that mean getting close to the roots anyway. But for branches, they're more likely to be recent, and you can get away without going very deep into the history. So it's a tradeoff. There's no run-time switch to flip between them, but a patch like this: diff --git a/builtin/branch.c b/builtin/branch.c index 8dcc2ed058..4d674e86d5 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin memset(, 0, sizeof(array)); + filter->with_commit_tag_algo = 1; filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN); if (filter->verbose) drops my run of "git branch -a --contains HEAD~100" from 8.6s to 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with a smaller number of branches, we may actually end up slower (because we dig further down in history, and don't benefit from the multiple-branch speedup). I tried to do a "best of both" algorithm in: https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/ which finds arbitrary numbers of merge bases in a single traversal. It did seem to work, but I felt uneasy about some of the corner cases. I've been meaning to revisit it, but obviously have never gotten around to it. The second slow thing is that during the traversal we load each commit object from disk. The solution there is to keep the parent information in a faster cache. I had a few proposals over the years, but I won't even bother to dig them up, because there's quite recent and promising work in this area from Derrick Stolee: https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/ And finally, the thing that the patches you linked are referencing is about using commit timestamps as a proxy for generation numbers. And Stolee's patches actually leave room for real, trustable generation numbers. Once we have the serialized commit graph and generation numbers, think the final step would just be to teach the "tag --contains" algorithm to stop walking down unproductive lines of history. And in fact, I think we can forget about the best-of-both multi-tip merge-base idea entirely. Because if you can use the generation numbers to avoid going too deep, then a depth-first approach is fine. And we'd just want to flip git-branch over to using that algorithm by default. -Peff
Re: [PATCH 0/4] Delete ignore_env member in struct repository
On Tue, Feb 27, 2018 at 3:46 AM, Stefan Bellerwrote: > What is the plan from here on? > Should I build further series on top of yours? The next series will > focus on the pack side of things (pack.h, packfile.{c,h}) > > So maybe we'll have Junio merge down my series (and yours as it > needs one reroll?) and then build on top of that? I think that's the less painful way for everybody. -- Duy
Re: [PATCH 0/4] Delete ignore_env member in struct repository
On Mon, Feb 26, 2018 at 2:30 AM, Nguyễn Thái Ngọc Duywrote: > It turns out I don't need my other series [1] in order to delete this > field. This series moves getenv() calls from > repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in > environment.c where they belong in my opinion. > > The repo_set_gitdir() now takes $GIT_DIR and optionally all other > configurable paths. If those paths are NULL, default repo layout will > be used. With getenv() no longer called inside repo_set_gitdir(), > ignore_env has no reason to stay. This is in 1/4. > > The getenv() in prepare_alt_odb() is also moved back to > setup_git_env() in 3/4. It demonstrates how we could move other > getenv() back to if we want. > > This series is built on top of Stefan's object-store-part1, v4. I > could rebase it on 'master' too, but then Junio may need to resolve > some conflicts. > Thanks for working on this, I found this series a pleasant read, the only issue I saw was Erics upbringing of multiple getenv calls without strdup()ing the content. What is the plan from here on? Should I build further series on top of yours? The next series will focus on the pack side of things (pack.h, packfile.{c,h}) So maybe we'll have Junio merge down my series (and yours as it needs one reroll?) and then build on top of that? Thanks, Stefan
Re: [PATCH 0/4] Delete ignore_env member in struct repository
Nguyễn Thái Ngọc Duywrites: > This series is built on top of Stefan's object-store-part1, v4. I > could rebase it on 'master' too, but then Junio may need to resolve > some conflicts. As a follow-up fix for combined "move things to the_repo as much as possible" efforts I think what you did makes most sense. I would say that it makes even more sense to make these part of the object-store series if/when the series needs rerolling--after all the topic is already a multi-author effort. I'll have to read them carefully before commenting on the actual patches ;-) Thanks. > > [1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/ > > Nguyễn Thái Ngọc Duy (4): > repository.c: move env-related setup code back to environment.c > repository.c: delete dead functions > sha1_file.c: move delayed getenv(altdb) back to setup_git_env() > repository: delete ignore_env member > > cache.h| 2 +- > environment.c | 13 +++-- > object-store.h | 5 +++- > object.c | 1 + > repository.c | 79 ++ > repository.h | 21 +++--- > setup.c| 3 +- > sha1_file.c| 6 +--- > 8 files changed, 64 insertions(+), 66 deletions(-)
[PATCH 0/4] Delete ignore_env member in struct repository
It turns out I don't need my other series [1] in order to delete this field. This series moves getenv() calls from repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in environment.c where they belong in my opinion. The repo_set_gitdir() now takes $GIT_DIR and optionally all other configurable paths. If those paths are NULL, default repo layout will be used. With getenv() no longer called inside repo_set_gitdir(), ignore_env has no reason to stay. This is in 1/4. The getenv() in prepare_alt_odb() is also moved back to setup_git_env() in 3/4. It demonstrates how we could move other getenv() back to if we want. This series is built on top of Stefan's object-store-part1, v4. I could rebase it on 'master' too, but then Junio may need to resolve some conflicts. [1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/ Nguyễn Thái Ngọc Duy (4): repository.c: move env-related setup code back to environment.c repository.c: delete dead functions sha1_file.c: move delayed getenv(altdb) back to setup_git_env() repository: delete ignore_env member cache.h| 2 +- environment.c | 13 +++-- object-store.h | 5 +++- object.c | 1 + repository.c | 79 ++ repository.h | 21 +++--- setup.c| 3 +- sha1_file.c| 6 +--- 8 files changed, 64 insertions(+), 66 deletions(-) -- 2.16.1.435.g8f24da2e1a
Re: [PATCH 0/4] Correct offsets of hunks when one is skipped
On 13/02/18 23:56, brian m. carlson wrote: > On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote: >> From: Phillip Wood>> >> While working on a patch series to stage selected lines from a hunk >> without having to edit it I got worried that subsequent patches would >> be applied in the wrong place which lead to this series to correct the >> offsets of hunks following those that are skipped or edited. >> >> Phillip Wood (4): >> add -i: add function to format hunk header >> t3701: add failing test for pathological context lines >> add -p: Adjust offsets of subsequent hunks when one is skipped >> add -p: calculate offset delta for edited patches >> >> git-add--interactive.perl | 93 >> +++--- >> t/t3701-add-interactive.sh | 30 +++ >> 2 files changed, 102 insertions(+), 21 deletions(-) > > This looks reasonably sane to me. I really like that you managed to > produce failing tests for this situation. I know pathological cases > like this have bit GCC in the past, so it's good that you fixed this. > Thanks Brain, it's interesting to hear that GCC has been bitten in the past Best Wishes Phillip
Re: [PATCH 0/4] Correct offsets of hunks when one is skipped
On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote: > From: Phillip Wood> > While working on a patch series to stage selected lines from a hunk > without having to edit it I got worried that subsequent patches would > be applied in the wrong place which lead to this series to correct the > offsets of hunks following those that are skipped or edited. > > Phillip Wood (4): > add -i: add function to format hunk header > t3701: add failing test for pathological context lines > add -p: Adjust offsets of subsequent hunks when one is skipped > add -p: calculate offset delta for edited patches > > git-add--interactive.perl | 93 > +++--- > t/t3701-add-interactive.sh | 30 +++ > 2 files changed, 102 insertions(+), 21 deletions(-) This looks reasonably sane to me. I really like that you managed to produce failing tests for this situation. I know pathological cases like this have bit GCC in the past, so it's good that you fixed this. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
[PATCH 0/4] Correct offsets of hunks when one is skipped
From: Phillip WoodWhile working on a patch series to stage selected lines from a hunk without having to edit it I got worried that subsequent patches would be applied in the wrong place which lead to this series to correct the offsets of hunks following those that are skipped or edited. Phillip Wood (4): add -i: add function to format hunk header t3701: add failing test for pathological context lines add -p: Adjust offsets of subsequent hunks when one is skipped add -p: calculate offset delta for edited patches git-add--interactive.perl | 93 +++--- t/t3701-add-interactive.sh | 30 +++ 2 files changed, 102 insertions(+), 21 deletions(-) -- 2.16.1
Re: [PATCH 0/4] Speed up git tag --contains
> This is a resubmission of Jeff King's patch series to speed up git tag > --contains with some changes. It's been cooking for a while as: Replying to this 6-year-old thread: Is there any chance this could be resurrected? We are using phabricator, which uses `git branch --contains` as part of its workflow. Our repo has ~1000 branches on it, and the contains operation is eating up all our CPU (and time). It would be very helpful to us to make this faster! (The original thread is at https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/ ) craig