Re: [PATCH 0/4]

2018-12-07 Thread Junio C Hamano
Stefan Beller  writes:

> A couple days before the 2.19 release we had a bug report about
> broken submodules[1] and reverted[2] the commits leading up to them.
>
> The behavior of said bug fixed itself by taking a different approach[3],
> specifically by a weaker enforcement of having `core.worktree` set in a
> submodule [4].
>
> The revert [2] was overly broad as we neared the release, such that we wanted
> to rather keep the known buggy behavior of always having `core.worktree` set,
> rather than figuring out how to fix the new bug of having 'git submodule 
> update'
> not working in old style repository setups.
>
> This series re-introduces those reverted patches, with no changes in code,
> but with drastically changed commit messages, as those focus on why it is safe
> to re-introduce them instead of explaining the desire for the change.

The above was a bit too cryptic for me to grok, so let me try
rephrasing to see if I got them all correctly.

 - three-patch series leading to 984cd77ddb were meant to fix some
   bug, but the series itself was buggy and caused problems; we got
   rid of them

 - the problem 984cd77ddb wanted to fix was fixed differently
   without reintroducing the problem three-patch series introduced.
   That fix is already with us since 4d6d6ef1fc.

 - now these three changes that were problematic in the past is
   resent without any update (other than that it has one preparatory
   patch to add tests).

Is that what is going on?  Obviously I am not getting "the other"
benefit we wanted to gain out of these three patches (because the
above description fails to explain what that is), other than to fix
the issue that was fixed by 4d6d6ef1fc.

Sorry for being puzzled...

> [1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight
> [2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 
> 2018-09-07)
> [3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17)
> [4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by 
> ensure-core-worktree, 2018-08-13)
>
> Stefan Beller (4):
>   submodule update: add regression test with old style setups
>   submodule: unset core.worktree if no working tree is present
>   submodule--helper: fix BUG message in ensure_core_worktree
>   submodule deinit: unset core.worktree
>
>  builtin/submodule--helper.c|  4 +++-
>  submodule.c| 14 ++
>  submodule.h|  2 ++
>  t/lib-submodule-update.sh  |  5 +++--
>  t/t7400-submodule-basic.sh |  5 +
>  t/t7412-submodule-absorbgitdirs.sh |  7 ++-
>  6 files changed, 33 insertions(+), 4 deletions(-)


[PATCH 0/4]

2018-12-07 Thread Stefan Beller
A couple days before the 2.19 release we had a bug report about
broken submodules[1] and reverted[2] the commits leading up to them.

The behavior of said bug fixed itself by taking a different approach[3],
specifically by a weaker enforcement of having `core.worktree` set in a
submodule [4].

The revert [2] was overly broad as we neared the release, such that we wanted
to rather keep the known buggy behavior of always having `core.worktree` set,
rather than figuring out how to fix the new bug of having 'git submodule update'
not working in old style repository setups.

This series re-introduces those reverted patches, with no changes in code,
but with drastically changed commit messages, as those focus on why it is safe
to re-introduce them instead of explaining the desire for the change.

[1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight
[2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07)
[3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17)
[4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by 
ensure-core-worktree, 2018-08-13)

Stefan Beller (4):
  submodule update: add regression test with old style setups
  submodule: unset core.worktree if no working tree is present
  submodule--helper: fix BUG message in ensure_core_worktree
  submodule deinit: unset core.worktree

 builtin/submodule--helper.c|  4 +++-
 submodule.c| 14 ++
 submodule.h|  2 ++
 t/lib-submodule-update.sh  |  5 +++--
 t/t7400-submodule-basic.sh |  5 +
 t/t7412-submodule-absorbgitdirs.sh |  7 ++-
 6 files changed, 33 insertions(+), 4 deletions(-)

-- 
2.20.0.rc2.403.gdbc3b29805-goog



Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-31 Thread Johannes Schindelin
Hi Junio,

On Wed, 31 Oct 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > An alternative approach which was rejected at the time (because it
> > interfered with the then-ongoing work to compile Git for Windows using
> > MS Visual C++) would patch the make_environment_block() function to
> > skip the specified environment variables before handing down the
> > environment block to the spawned process. Currently it would interfere
> > with the mingw-utf-8-env patch series I sent earlier today
> > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].
> 
> So the rejected approach that was not used in this series would
> interfere with the other one, but I do not have to worry about it
> because this series does not use that approach?  I had to read the six
> lines above twice to figure out that it essentially is saying "I
> shouldn't care", but please let me know if I misread the paragraph and I
> need to care ;-)

Well, you might want to worry about it. Or not.

The approach taken by this patch series is to call `unsetenv()` for the
variable names listed in `core.unsetenvvars` (if any), just before
spawning off the first process (if any).

What I meant by this comment in the cover letter is that I thought about
doing it differently. We already have a perfectly fine function called
`make_environment_block()` that takes a "deltaenv", and then constructs a
new environment block from the current environment plus deltaenv. And this
would be an obvious alternative place to "unset" the variables, as this
function is only called just before spawning new processes.

I was weighing both options, and both back then as right now, there are
other patches in flight that conflict with the second approach, so the
first approach is what I went with.

>From a purely aesthetic point of view, the second approach looks nicer, as
it really only affects the spawned processes (and not the current one),
and it allows for changing the list between spawning processes.

But to do it right, I would also have to use a hash set, and it would
complicate the code of `make_environment_block()` even further. And it
sounds a bit like overkill to me, too.

So I sided with the approach presented in the current revision of the
patch series, but I wanted to describe the other approach in case you (or
other reviewers) have a different opinion.

> > While at it, this patch series also cleans up house and moves the
> > Windows-specific core.* variable handling to compat/mingw.c rather
> > than cluttering environment.c and config.c with things that e.g.
> > developers on Linux do not want to care about.
> 
> Or Macs.

Or macOS. Or FreeBSD. Or Irix. Or any other, that's right ;-)

Traditionally, we did not really care all that much about platforms other
than Linux, though, which is what made me write what I wrote. Having said
that, I get the impression that this is changing slowly. The benefits are
pretty clear, too. After all, it was a *Windows* build failure recently
that let Peff identify and fix a *non-Windows* bug...

> When I skimmed this series earlier, I found that patches 2 & 3 sensibly
> implemented to achieve this goal.

Thanks!
Dscho

> 
> >
> > Johannes Schindelin (4):
> >   config: rename `dummy` parameter to `cb` in git_default_config()
> >   Allow for platform-specific core.* config settings
> >   Move Windows-specific config settings into compat/mingw.c
> >   mingw: unset PERL5LIB by default
> >
> >  Documentation/config.txt |  6 
> >  cache.h  |  8 -
> >  compat/mingw.c   | 58 +++-
> >  compat/mingw.h   |  3 ++
> >  config.c | 18 ---
> >  environment.c|  1 -
> >  git-compat-util.h|  8 +
> >  t/t0029-core-unsetenvvars.sh | 30 +++
> >  8 files changed, 109 insertions(+), 23 deletions(-)
> >  create mode 100755 t/t0029-core-unsetenvvars.sh
> >
> >
> > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> > Published-As: 
> > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> > pr-62/dscho/perl5lib-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/62
> 


Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-30 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> An alternative approach which was rejected at the time (because it
> interfered with the then-ongoing work to compile Git for Windows using MS
> Visual C++) would patch the make_environment_block() function to skip the
> specified environment variables before handing down the environment block to
> the spawned process. Currently it would interfere with the mingw-utf-8-env
> patch series I sent earlier today
> [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].

So the rejected approach that was not used in this series would
interfere with the other one, but I do not have to worry about it
because this series does not use that approach?  I had to read the
six lines above twice to figure out that it essentially is saying "I
shouldn't care", but please let me know if I misread the paragraph
and I need to care ;-)

> While at it, this patch series also cleans up house and moves the
> Windows-specific core.* variable handling to compat/mingw.c rather than
> cluttering environment.c and config.c with things that e.g. developers on
> Linux do not want to care about.

Or Macs.  When I skimmed this series earlier, I found that patches 2
& 3 sensibly implemented to achieve this goal.

>
> Johannes Schindelin (4):
>   config: rename `dummy` parameter to `cb` in git_default_config()
>   Allow for platform-specific core.* config settings
>   Move Windows-specific config settings into compat/mingw.c
>   mingw: unset PERL5LIB by default
>
>  Documentation/config.txt |  6 
>  cache.h  |  8 -
>  compat/mingw.c   | 58 +++-
>  compat/mingw.h   |  3 ++
>  config.c | 18 ---
>  environment.c|  1 -
>  git-compat-util.h|  8 +
>  t/t0029-core-unsetenvvars.sh | 30 +++
>  8 files changed, 109 insertions(+), 23 deletions(-)
>  create mode 100755 t/t0029-core-unsetenvvars.sh
>
>
> base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> Published-As: 
> https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> pr-62/dscho/perl5lib-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/62


[PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-30 Thread Johannes Schindelin via GitGitGadget
In Git for Windows, we do not support running the Perl scripts with any
random Perl interpreter. Instead, we insist on using the shipped one (except
for MinGit, where we do not even ship the Perl scripts, to save on space).

However, if Git is called from, say, a Perl script running in a different
Perl interpreter with appropriately configured PERL5LIB, it messes with
Git's ability to run its Perl scripts.

For that reason, we devised the presented method of defining a list of
environment variables (via core.unsetEnvVars) that would then be unset
before spawning any process, defaulting to PERL5LIB.

An alternative approach which was rejected at the time (because it
interfered with the then-ongoing work to compile Git for Windows using MS
Visual C++) would patch the make_environment_block() function to skip the
specified environment variables before handing down the environment block to
the spawned process. Currently it would interfere with the mingw-utf-8-env
patch series I sent earlier today
[https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].

While at it, this patch series also cleans up house and moves the
Windows-specific core.* variable handling to compat/mingw.c rather than
cluttering environment.c and config.c with things that e.g. developers on
Linux do not want to care about.

Johannes Schindelin (4):
  config: rename `dummy` parameter to `cb` in git_default_config()
  Allow for platform-specific core.* config settings
  Move Windows-specific config settings into compat/mingw.c
  mingw: unset PERL5LIB by default

 Documentation/config.txt |  6 
 cache.h  |  8 -
 compat/mingw.c   | 58 +++-
 compat/mingw.h   |  3 ++
 config.c | 18 ---
 environment.c|  1 -
 git-compat-util.h|  8 +
 t/t0029-core-unsetenvvars.sh | 30 +++
 8 files changed, 109 insertions(+), 23 deletions(-)
 create mode 100755 t/t0029-core-unsetenvvars.sh


base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-62/dscho/perl5lib-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/62
-- 
gitgitgadget


Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking

2018-10-30 Thread Junio C Hamano
Geert Jansen  writes:

> Maybe a better option would be to check for the non-existence of the [00-ff]
> directories under .git/objects.

Please do not do this; I expect many people do this before they
leave work, just like I do:

$ git repack -a -d -f --window=$largs --depth=$small
$ git prune

which would typically leave only info/ and pack/ subdirectories
under .git/objects/ directory.


Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking

2018-10-29 Thread Geert Jansen
On Sun, Oct 28, 2018 at 10:50:19PM +, Ævar Arnfjörð Bjarmason wrote:

> I left the door open for that in the new config option 4/4 implements,
> but I suspect for Geert's purposes this is something he'd prefer to
> turn off in git on clone entirely, i.e. because it may be running on
> some random Amazon's customer's EFS instance, and they won't know
> about this new core.checkCollisions option.
> 
> But maybe I'm wrong about that and Geert is happy to just turn on
> core.checkCollisions=false and use this series instead.

I think that the best user experience would probably be if git were fast by
default without having to give up on (future) security by removing the sha1
collision check.  Maybe core.checkCollisons could default to "on" only when
there's no loose objects in the repository? That would give a fast experience
for many common cases (git clone, git init && git fetch) while still doing the
collision check when relevant.

My patch used the --cloning flag as an approximation of "no loose objects".
Maybe a better option would be to check for the non-existence of the [00-ff]
directories under .git/objects.


[PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking

2018-10-28 Thread Ævar Arnfjörð Bjarmason
This patch series implements what I suggested in
https://public-inbox.org/git/87lg6jljmf@evledraar.gmail.com/

It's not a replacement for what Geert Jansen's RFC in
https://public-inbox.org/git/ed25e182-c296-4d08-8170-340567d89...@amazon.com/
does, which was to turn this off entirely on "clone".

I left the door open for that in the new config option 4/4 implements,
but I suspect for Geert's purposes this is something he'd prefer to
turn off in git on clone entirely, i.e. because it may be running on
some random Amazon's customer's EFS instance, and they won't know
about this new core.checkCollisions option.

But maybe I'm wrong about that and Geert is happy to just turn on
core.checkCollisions=false and use this series instead.

Ævar Arnfjörð Bjarmason (4):
  pack-objects test: modernize style
  pack-objects tests: don't leave test .git corrupt at end
  index-pack tests: don't leave test repo dirty at end
  index-pack: add ability to disable SHA-1 collision check

 Documentation/config.txt | 68 
 builtin/index-pack.c |  7 ++--
 cache.h  |  1 +
 config.c | 20 +++
 config.h |  1 +
 environment.c|  1 +
 t/README |  3 ++
 t/t1060-object-corruption.sh | 37 +++-
 t/t5300-pack-object.sh   | 51 +++
 9 files changed, 163 insertions(+), 26 deletions(-)

-- 
2.19.1.759.g500967bb5e



Re: [PATCH 0/4] Bloom filter experiment

2018-10-17 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> This is all to say: having a maximum size is good. 512 is big enough
>> to cover _most_ commits, but not so big that we may store _really_ big
>> filters.
>
> Makes sense. 512 is good enough to hardcode initially, but I couldn't
> tell from briefly skimming the patches if it was possible to make this
> size dynamic per-repo when the graph/filter is written.
>
> I.e. we might later add some discovery step where we look at N number of
> commits at random, until we're satisfied that we've come up with some
> average/median number of total (recursive) tree entries & how many tend
> to be changed per-commit.
>
> I.e. I can imagine repositories (with some automated changes) where we
> have 10k files and tend to change 1k per commit, or ones with 10k files
> where we tend to change just 1-10 per commit, which would mean a
> larger/smaller filter would be needed / would do.

I was more interested to find out what the advice our docs should
give to the end users to tune the value once such a knob is
invented, and what you gave in the above paragraphs may lead us to a
nice auto-tuning heuristics.



Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Jonathan Tan
> | Implementation | Queries | Maybe | FP # | FP %  |
> ||-|---|--|---|
> | Szeder | 66095   | 1142  | 256  | 0.38% |
> | Jonathan   | 66459   | 107   | 89   | 0.16% |
> | Stolee | 53025   | 492   | 479  | 0.90% |
> 
> (Note that we must have used different starting points, which is why my 
> "Queries" is so much smaller.)

I suspect it's because your bloom filter implementation covers only the
first parent (if I'm understanding get_bloom_filter() correctly). When I
only covered the first parent in my initial test (see patch 2 of [1]), I
got (following the columns in the table above):

  53096 107 89 0.001676

Also, I think that the rejecting power (Queries - Maybe)/(Total tree
comparisons if no bloom filters were used) needs to be in the evaluation
criteria somewhere, as that indicates how many tree comparisons we
managed to avoid.

Also, we probably should also test on a file that changes more
frequently :-)

[1] https://public-inbox.org/git/cover.1539219248.git.jonathanta...@google.com/

> The increase in false-positive percentage is expected in my 
> implementation. I'm using the correct filter sizes to hit a <1% FP 
> ratio. This could be lowered by changing the settings, and the size 
> would dynamically grow. For my Git repo (which contains 
> git-for-windows/git and microsoft/git) this implementation grows the 
> commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to 
> Szeder's 8MB filter). For 105,260 commits, that rounds out to less than 
> 20 bytes per commit (compared to Jonathan's 256 bytes per commit).

Mine has 256 bits per commit, which is 32 bytes per commit (still more
than yours).

Having said all that, thanks for writing up your version - in
particular, variable sized filters (like in yours) seem to be the way to
go.

> We'll see how much time I have to do this polish, but I think the 
> benefit is proven.

Agreed.


Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Derrick Stolee

On 10/16/2018 8:57 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Oct 16 2018, Derrick Stolee wrote:


On 10/16/2018 12:45 AM, Junio C Hamano wrote:

Derrick Stolee  writes:


2. The filters are sized according to the number of changes in each
commit, with a minimum of one 64-bit word.
...
6. When we compute the Bloom filters, we don't store a filter for
commits whose first-parent diff has more than 512 paths.

Just being curious but was 512 taken out of thin air or is there
some math behind it, e.g. to limit false positive rate down to
certain threshold?  With a wide-enough bitset, you could store
arbitrary large number of paths with low enough false positive, I
guess, but is there a point where there is too many paths in the
change that gives us diminishing returns and not worth having a
filter in the first place?

512 is somewhat arbitrary, but having a maximum size is not.

In a normal source-code-control context, the set of paths modified
by any single commit ought to be a small subset of the entire paths,
and whole-tree changes ought to be fairly rare.  In a project for
which that assumption does not hold, it might help to have a
negative bloom filter (i.e. "git log -- A" asks "does the commit
modify A?" and the filter would say "we know it does not, because we
threw all the paths that are not touched to the bloom filter"), but
I think that would optimize for a wrong case.

A commit with many changed paths is very rare. The 512 I picked above
is enough to cover 99% of commits in each of the repos I sampled when
first investigating Bloom filters.

When a Bloom filter response says "maybe yes" (in our case, "maybe not
TREESAME"), then we need to verify that it is correct. In the extreme
case that every path is changed, then the Bloom filter does nothing
but add extra work.

These extreme cases are also not unprecedented: in our Azure Repos
codebase, we were using core.autocrlf to smudge CRLFs to LFs, but
when it was time to dogfood VFS for Git, we needed to turn off the
smudge filter. So, there is one commit that converts every LF to a
CRLF in every text file. Storing a Bloom filter for those ~250,000
entries would take ~256KB for essentially no value. By not storing a
filter for this commit, we go immediately to the regular TREESAME
check, which would happen for most pathspecs.

This is all to say: having a maximum size is good. 512 is big enough
to cover _most_ commits, but not so big that we may store _really_ big
filters.

Makes sense. 512 is good enough to hardcode initially, but I couldn't
tell from briefly skimming the patches if it was possible to make this
size dynamic per-repo when the graph/filter is written.
My proof-of-concept has it as a constant, but part of my plan is to make 
these all config options, as of this item in my message:


>>> 2. We need config values for writing and consuming bloom filters, 
but also to override the default settings.



I.e. we might later add some discovery step where we look at N number of
commits at random, until we're satisfied that we've come up with some
average/median number of total (recursive) tree entries & how many tend
to be changed per-commit.

I.e. I can imagine repositories (with some automated changes) where we
have 10k files and tend to change 1k per commit, or ones with 10k files
where we tend to change just 1-10 per commit, which would mean a
larger/smaller filter would be needed / would do.
I'm not sure a dynamic approach would be worth the effort, but I'm open 
to hearing the results of an experiment.


Thanks,
-Stolee


Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Ævar Arnfjörð Bjarmason


On Tue, Oct 16 2018, Derrick Stolee wrote:

> On 10/16/2018 12:45 AM, Junio C Hamano wrote:
>> Derrick Stolee  writes:
>>
>>> 2. The filters are sized according to the number of changes in each
>>> commit, with a minimum of one 64-bit word.
>>> ...
>>> 6. When we compute the Bloom filters, we don't store a filter for
>>> commits whose first-parent diff has more than 512 paths.
>> Just being curious but was 512 taken out of thin air or is there
>> some math behind it, e.g. to limit false positive rate down to
>> certain threshold?  With a wide-enough bitset, you could store
>> arbitrary large number of paths with low enough false positive, I
>> guess, but is there a point where there is too many paths in the
>> change that gives us diminishing returns and not worth having a
>> filter in the first place?
> 512 is somewhat arbitrary, but having a maximum size is not.
>> In a normal source-code-control context, the set of paths modified
>> by any single commit ought to be a small subset of the entire paths,
>> and whole-tree changes ought to be fairly rare.  In a project for
>> which that assumption does not hold, it might help to have a
>> negative bloom filter (i.e. "git log -- A" asks "does the commit
>> modify A?" and the filter would say "we know it does not, because we
>> threw all the paths that are not touched to the bloom filter"), but
>> I think that would optimize for a wrong case.
>
> A commit with many changed paths is very rare. The 512 I picked above
> is enough to cover 99% of commits in each of the repos I sampled when
> first investigating Bloom filters.
>
> When a Bloom filter response says "maybe yes" (in our case, "maybe not
> TREESAME"), then we need to verify that it is correct. In the extreme
> case that every path is changed, then the Bloom filter does nothing
> but add extra work.
>
> These extreme cases are also not unprecedented: in our Azure Repos
> codebase, we were using core.autocrlf to smudge CRLFs to LFs, but
> when it was time to dogfood VFS for Git, we needed to turn off the
> smudge filter. So, there is one commit that converts every LF to a
> CRLF in every text file. Storing a Bloom filter for those ~250,000
> entries would take ~256KB for essentially no value. By not storing a
> filter for this commit, we go immediately to the regular TREESAME
> check, which would happen for most pathspecs.
>
> This is all to say: having a maximum size is good. 512 is big enough
> to cover _most_ commits, but not so big that we may store _really_ big
> filters.

Makes sense. 512 is good enough to hardcode initially, but I couldn't
tell from briefly skimming the patches if it was possible to make this
size dynamic per-repo when the graph/filter is written.

I.e. we might later add some discovery step where we look at N number of
commits at random, until we're satisfied that we've come up with some
average/median number of total (recursive) tree entries & how many tend
to be changed per-commit.

I.e. I can imagine repositories (with some automated changes) where we
have 10k files and tend to change 1k per commit, or ones with 10k files
where we tend to change just 1-10 per commit, which would mean a
larger/smaller filter would be needed / would do.


Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Derrick Stolee

On 10/16/2018 12:45 AM, Junio C Hamano wrote:

Derrick Stolee  writes:


2. The filters are sized according to the number of changes in each
commit, with a minimum of one 64-bit word.
...
6. When we compute the Bloom filters, we don't store a filter for
commits whose first-parent diff has more than 512 paths.

Just being curious but was 512 taken out of thin air or is there
some math behind it, e.g. to limit false positive rate down to
certain threshold?  With a wide-enough bitset, you could store
arbitrary large number of paths with low enough false positive, I
guess, but is there a point where there is too many paths in the
change that gives us diminishing returns and not worth having a
filter in the first place?

512 is somewhat arbitrary, but having a maximum size is not.

In a normal source-code-control context, the set of paths modified
by any single commit ought to be a small subset of the entire paths,
and whole-tree changes ought to be fairly rare.  In a project for
which that assumption does not hold, it might help to have a
negative bloom filter (i.e. "git log -- A" asks "does the commit
modify A?" and the filter would say "we know it does not, because we
threw all the paths that are not touched to the bloom filter"), but
I think that would optimize for a wrong case.


A commit with many changed paths is very rare. The 512 I picked above is 
enough to cover 99% of commits in each of the repos I sampled when first 
investigating Bloom filters.


When a Bloom filter response says "maybe yes" (in our case, "maybe not 
TREESAME"), then we need to verify that it is correct. In the extreme 
case that every path is changed, then the Bloom filter does nothing but 
add extra work.


These extreme cases are also not unprecedented: in our Azure Repos 
codebase, we were using core.autocrlf  to smudge CRLFs to LFs, but when 
it was time to dogfood VFS for Git, we needed to turn off the smudge 
filter. So, there is one commit that converts every LF to a CRLF in 
every text file. Storing a Bloom filter for those ~250,000 entries would 
take ~256KB for essentially no value. By not storing a filter for this 
commit, we go immediately to the regular TREESAME check, which would 
happen for most pathspecs.


This is all to say: having a maximum size is good. 512 is big enough to 
cover _most_ commits, but not so big that we may store _really_ big filters.


Thanks,
-Stolee



Re: [PATCH 0/4] Bloom filter experiment

2018-10-15 Thread Junio C Hamano
Derrick Stolee  writes:

> 2. The filters are sized according to the number of changes in each
> commit, with a minimum of one 64-bit word.
> ...
> 6. When we compute the Bloom filters, we don't store a filter for
> commits whose first-parent diff has more than 512 paths.

Just being curious but was 512 taken out of thin air or is there
some math behind it, e.g. to limit false positive rate down to
certain threshold?  With a wide-enough bitset, you could store
arbitrary large number of paths with low enough false positive, I
guess, but is there a point where there is too many paths in the
change that gives us diminishing returns and not worth having a
filter in the first place?

In a normal source-code-control context, the set of paths modified
by any single commit ought to be a small subset of the entire paths,
and whole-tree changes ought to be fairly rare.  In a project for
which that assumption does not hold, it might help to have a
negative bloom filter (i.e. "git log -- A" asks "does the commit
modify A?" and the filter would say "we know it does not, because we
threw all the paths that are not touched to the bloom filter"), but
I think that would optimize for a wrong case.



Re: [PATCH 0/4] Bloom filter experiment

2018-10-15 Thread Derrick Stolee

On 10/9/2018 3:34 PM, SZEDER Gábor wrote:

To keep the ball rolling, here is my proof of concept in a somewhat
cleaned-up form, with still plenty of rough edges.


Peff, Szeder, and Jonathan,

Thanks for giving me the kick in the pants to finally write a proof of 
concept for my personal take on how this should work. My implementation 
borrows things from both Szeder and Jonathan's series. You can find my 
commits for all of the versions on GitHub (it's a bit too messy to share 
as a patch series right now, I think):


Repo: https://github.com/derrickstolee/git
Branches: bloom/* (includes bloom/stolee, bloom/peff, bloom/szeder, and 
bloom/tan for the respective implementations, and bloom/base as the 
common ancestor)


My implementation uses the following scheme:

1. Bloom filters are computed and stored on a commit-by-commit basis.

2. The filters are sized according to the number of changes in each 
commit, with a minimum of one 64-bit word.


3. The filters are stored in the commit-graph using two new optional 
chunks: one stores a single 32-bit integer for each commit that provides 
the end of its Bloom filter in the second "data" chunk. The data chunk 
also stores the magic constants (hash version, num hash keys, and num 
bits per entry).


4. We fill the Bloom filters as (const char *data, int len) pairs as 
"struct bloom_filter"s in a commit slab.


5. In order to evaluate containment, we need the struct bloom_filter, 
but also struct bloom_settings (stores the magic constants in one 
place), and struct bloom_key (stores the _k_ hash values). This allows 
us to hash a path once and test the same path against many Bloom filters.


6. When we compute the Bloom filters, we don't store a filter for 
commits whose first-parent diff has more than 512 paths.


7. When we compute the commit-graph, we can re-use the pre-existing 
filters without needing to recompute diffs. (Caveat: the current 
implementation will re-compute diffs for the commits with diffs that 
were too large.)


You can build the Bloom filters in my implementation this way:

GIT_TEST_BLOOM_FILTERS=1 ./git commit-graph write --reachable


You can play around with it like this:

   $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write
   Computing commit graph generation numbers: 100% (52801/52801), done.
   Computing bloom filter: 100% (52801/52801), done.
   # Yeah, I even added progress indicator! :)
   $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count 
--full-history HEAD -- t/valgrind/valgrind.sh
   886
   20:40:24.783699 revision.c:486  bloom filter total queries: 66095 
definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873


Jonathan used this same test, so will I. Here is a summary table:

| Implementation | Queries | Maybe | FP # | FP %  |
||-|---|--|---|
| Szeder | 66095   | 1142  | 256  | 0.38% |
| Jonathan   | 66459   | 107   | 89   | 0.16% |
| Stolee | 53025   | 492   | 479  | 0.90% |

(Note that we must have used different starting points, which is why my 
"Queries" is so much smaller.)


The increase in false-positive percentage is expected in my 
implementation. I'm using the correct filter sizes to hit a <1% FP 
ratio. This could be lowered by changing the settings, and the size 
would dynamically grow. For my Git repo (which contains 
git-for-windows/git and microsoft/git) this implementation grows the 
commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to 
Szeder's 8MB filter). For 105,260 commits, that rounds out to less than 
20 bytes per commit (compared to Jonathan's 256 bytes per commit).


Related stats for my Linux repo: 781,756 commits, commit-graph grows 
from 43.8 to 55.6 MB (~12 MB additional, ~16 bytes per commit).


I haven't done a side-by-side performance test for these 
implementations, but it would be interesting to do so.


Despite writing a lot of code in a short amount of time, there is a lot 
of work to be done before this is submittable:


1. There are three different environment variables right now. It would 
be better to have one GIT_TEST_ variable and rely on existing tracing 
for logs (trace2 values would be good here).


2. We need config values for writing and consuming bloom filters, but 
also to override the default settings.


3. My bloom.c/bloom.h is too coupled to the commit-graph. I want to 
harden that interface to let Bloom filters live as their own thing, but 
that the commit-graph could load a bloom filter from the file instead of 
from the slab.


4. Tests, tests, and more tests.

We'll see how much time I have to do this polish, but I think the 
benefit is proven.


Thanks,
-Stolee


[PATCH 0/4] More merge cleanups

2018-10-12 Thread Elijah Newren
This series builds on en/merge-cleanup.  Technically, this could be
broken into three separate topics with only one of them depending on
en/merge-cleanup, but I have a subsequent series that depends on both
en/merge-cleanup and the fixes here, so I'm submitting these four
patches as a set.

Elijah Newren (4):
  t6036: add testcase where virtual merge base contains nested conflicts
  merge-recursive: increase marker length with depth of recursion

Improving diff3 conflict markers in the face of arbitrarily deeply
nested conflicts

  merge-recursive: improve auto-merging messages with path collisions

Simple attempt at wording improvement

  merge-recursive: Avoid showing conflicts with merge branch before HEAD

Conflict markers are expected to be shown in a certain order

 ll-merge.c|   4 +-
 ll-merge.h|   1 +
 merge-recursive.c |  59 +--
 t/t6036-recursive-corner-cases.sh | 159 +-
 4 files changed, 208 insertions(+), 15 deletions(-)

-- 
2.19.0.235.g7c386e1068



Re: [PATCH 0/4] Bloom filter experiment

2018-10-09 Thread Derrick Stolee

On 10/9/2018 3:34 PM, SZEDER Gábor wrote:

To keep the ball rolling, here is my proof of concept in a somewhat
cleaned-up form, with still plenty of rough edges.

You can play around with it like this:

   $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write
   Computing commit graph generation numbers: 100% (52801/52801), done.
   Computing bloom filter: 100% (52801/52801), done.
   # Yeah, I even added progress indicator! :)
   $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count 
--full-history HEAD -- t/valgrind/valgrind.sh
   886
   20:40:24.783699 revision.c:486  bloom filter total queries: 66095 
definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873

The value of $GIT_USE_POC_BLOOM_FILTER only really matters when writing
the Bloom filter, and it specifies the number of bits in the filter's
bitmap, IOW the above command creates a 8MB Bloom filter.  To make use
of the filter the variable can be anything non-empty.

Writing the Bloom filter is very slow as it is (yeah, that's why
bothered with the progress indicator ;).  I wrote about it in patch 2's
commit message: the cause for about half of the slowness is rather
obvious, but I don't (yet) know what's responsible for the other half.


Not a single test...  but I've run loops over all files in git.git
comparing 'git rev-list HEAD -- $file's output with and without the
Bloom filter, and, surprisingly, they match.  My quick'n'dirty
experiments usually don't fare this well...


It's also available at:

   https://github.com/szeder/git bloom-filter-experiment


Thanks! I will take a close look at this tomorrow and start playing with it.

-Stolee



[PATCH 0/4] Bloom filter experiment

2018-10-09 Thread SZEDER Gábor
To keep the ball rolling, here is my proof of concept in a somewhat
cleaned-up form, with still plenty of rough edges.

You can play around with it like this:

  $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write
  Computing commit graph generation numbers: 100% (52801/52801), done.
  Computing bloom filter: 100% (52801/52801), done.
  # Yeah, I even added progress indicator! :)
  $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count 
--full-history HEAD -- t/valgrind/valgrind.sh
  886
  20:40:24.783699 revision.c:486  bloom filter total queries: 66095 
definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873

The value of $GIT_USE_POC_BLOOM_FILTER only really matters when writing
the Bloom filter, and it specifies the number of bits in the filter's
bitmap, IOW the above command creates a 8MB Bloom filter.  To make use
of the filter the variable can be anything non-empty.

Writing the Bloom filter is very slow as it is (yeah, that's why
bothered with the progress indicator ;).  I wrote about it in patch 2's
commit message: the cause for about half of the slowness is rather
obvious, but I don't (yet) know what's responsible for the other half.


Not a single test...  but I've run loops over all files in git.git
comparing 'git rev-list HEAD -- $file's output with and without the
Bloom filter, and, surprisingly, they match.  My quick'n'dirty
experiments usually don't fare this well...


It's also available at:

  https://github.com/szeder/git bloom-filter-experiment


SZEDER Gábor (4):
  Add a (very) barebones Bloom filter implementation
  commit-graph: write a Bloom filter containing changed paths for each
commit
  revision.c: use the Bloom filter to speed up path-limited revision
walks
  revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics

 Makefile   |   1 +
 bloom-filter.c | 103 +++
 bloom-filter.h |  39 +++
 commit-graph.c | 116 
 pathspec.h |   1 +
 revision.c | 129 +
 6 files changed, 389 insertions(+)
 create mode 100644 bloom-filter.c
 create mode 100644 bloom-filter.h

-- 
2.19.1.409.g0a0ee5eb6b



[PATCH 0/4] Multiple subtree split fixes regarding complex repos

2018-09-28 Thread Strain, Roger L
We recently (about eight months ago) transitioned to git source control systems 
for several very large, very complex systems. We brought over several active 
versions requiring maintenance updates, and also set up several subtree repos 
to manage code shared between the systems. Recently, we attempted to push 
updates back to those subtrees and encountered errors. I believe I have 
identified and corrected the errors we found in our repos, and would like to 
contribute those fixes back.

Commands to demonstrate both failures using the current version of the subtree 
script are here:
https://gist.github.com/FoxFireX/1b794384612b7fd5e7cd157cff96269e

Short summary of three problems involved:
1. Split using rejoins fails in some cases where a commit has a parent which 
was a parent commit further upstream from a rejoin, causing a new initial 
commit to be created, which is not related to the original subtree commits.
2. Split using rejoins fails to generate a merge commit which may have triaged 
the previous problem, but instead elected to use only the parent which is not 
connected to the original subtree commits. (This may occur when the commit and 
both parents all share the same subtree hash.)
3. Split ignoring joins also ignores the original add commit, which causes 
content prior to the add to be considered part of the subtree graph, changing 
the commit hashes so it is not connected to the original subtree commits.

The following commits address each problem individually, along with a single 
commit that makes no functional change but performs a small refactor of the 
existing code. Hopefully that will make reviewing it a simpler task. This is my 
first attempt at submitting a patch back, so apologies if I've made any errors 
in the process.

Strain, Roger L (4):
  subtree: refactor split of a commit into standalone method
  subtree: make --ignore-joins pay attention to adds
  subtree: use commits before rejoins for splits
  subtree: improve decision on merges kept in split

 contrib/subtree/git-subtree.sh | 129 +
 1 file changed, 83 insertions(+), 46 deletions(-)

-- 
2.19.0.windows.1



[PATCH 0/4] git-commit-graph.txt: various cleanups

2018-09-19 Thread Martin Ågren
The first patch is a bug-fix. The second applies some more
`monospace`-ing, which should also be good thing.

The last two patches are based on my understanding that `git
commit-graph` handles the "commit graph file", without a dash. If that's
correct, there might be more such cleanups to be made in other parts of
git.git. If the dash should actually be there, I could do these changes
in the other direction. Or maybe dash-vs-no-dash is not an actual
problem at all...

Martin

Martin Ågren (4):
  git-commit-graph.txt: fix bullet lists
  git-commit-graph.txt: typeset more in monospace
  git-commit-graph.txt: refer to "*commit* graph file"
  git-commit-graph.txt: refer to the "commit graph file" without dash

 Documentation/git-commit-graph.txt | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

-- 
2.19.0.216.g2d3b1c576c



[PATCH 0/4] Cleanup of merge_*() functions in merge-recursive

2018-09-19 Thread Elijah Newren
While working on a series to make file collision conflict handling
consistent, I discovered a latent bug in merge_content()...and several
opportunities for cleanup.  This series fixes that bug, deletes the
merge_file_special_markers() and merge_file_one() functions, and
renames a pair of functions to make them have a better description.

Elijah Newren (4):
  merge-recursive: set paths correctly when three-way merging content
  merge-recursive: avoid wrapper function when unnecessary and wasteful
  merge-recursive: remove final remaining caller of merge_file_one()
  merge-recursive: rename merge_file_1() and merge_content()

 merge-recursive.c | 144 --
 1 file changed, 51 insertions(+), 93 deletions(-)

-- 
2.19.0.12.gc6760fd9a9



Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-10 Thread Junio C Hamano
Jeff King  writes:

>> Either is fine.  I am not moving 'next' beyond what is necessary for
>> this release cycle during the pre-release freeze period, and because
>> I thought that Peff was in favor of squashing in the changes to the
>> original when the next cycle starts, I haven't bothered to merge the
>> fix there yet.
>
> I think Ævar's point is just that the current tip of next is badly
> broken if you use bitmaps, and it's worth hot-fixing that in the
> meantime.

I know that ;-)

My point is that anybody who *needs* handholding from the upstream
project (i.e. not Ævar, but those whom he is trying to help with the
suggestion) should not be looking at 'next' during the prerelease
freeze.  They should be looking at 'master' instead, and that is why
I am not spending more than minimum cycles of my own worrying about
'next' during the prerelease.

In any case, I think the final is ready to go, almost.  I still need
to do the preformatted docs, though.



Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-10 Thread Jeff King
On Mon, Sep 10, 2018 at 09:53:56AM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
> 
> > I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when
> > building my own package of git, but I think this really should be fixed
> > in that branch, either by merging the fix down or reverting the original
> > series out of next, I think just merging the fix down makes sense, but
> > have no strong opinion on it.
> 
> Either is fine.  I am not moving 'next' beyond what is necessary for
> this release cycle during the pre-release freeze period, and because
> I thought that Peff was in favor of squashing in the changes to the
> original when the next cycle starts, I haven't bothered to merge the
> fix there yet.

I think Ævar's point is just that the current tip of next is badly
broken if you use bitmaps, and it's worth hot-fixing that in the
meantime.

After the release, I'm OK with either one of squashing that first patch
in, or just merging as-is.

-Peff


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-10 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when
> building my own package of git, but I think this really should be fixed
> in that branch, either by merging the fix down or reverting the original
> series out of next, I think just merging the fix down makes sense, but
> have no strong opinion on it.

Either is fine.  I am not moving 'next' beyond what is necessary for
this release cycle during the pre-release freeze period, and because
I thought that Peff was in favor of squashing in the changes to the
original when the next cycle starts, I haven't bothered to merge the
fix there yet.


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-08 Thread Ævar Arnfjörð Bjarmason


On Sat, Sep 01 2018, Jeff King wrote:

> On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote:
>
>> On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > On Tue, Aug 21 2018, Jeff King wrote:
>> >
>> > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
>> > > + const unsigned char *sha1)
>> > > +{
>> > > +int pos;
>> > > +
>> > > +if (!bitmap_git)
>> > > +return 0; /* no bitmap loaded */
>> > > +if (!bitmap_git->result)
>> > > +BUG("failed to perform bitmap walk before querying");
>> >
>> > Some part of what calls this completely breaks pushing from the "next"
>> > branch when you have local bitmaps (we *really* should have some tests
>> > for this...).
>>
>> Yikes, thanks for reporting. I agree we need better tests here.
>
> OK, here is the fix. Since the problem is in 'next', this is done as a
> patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to
> rewind 'next' post-release anyway, we could squash it directly into
> 30cdc33fba from the original series. That would help later bisections
> from running into it, which may be worth it as it's a pretty severe
> breakage.  Or maybe not:

Junio: Just a reminder that next is still broken with this, and I see
e.g. the Debian "experimental" has the bug but not the fix at this
point.

I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when
building my own package of git, but I think this really should be fixed
in that branch, either by merging the fix down or reverting the original
series out of next, I think just merging the fix down makes sense, but
have no strong opinion on it.


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-04 Thread Jeff King
On Tue, Sep 04, 2018 at 12:30:14PM -0700, Stefan Beller wrote:

> >   [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check
> >
> > The actual fix. This should get merged to next ASAP (or the original
> > topic just reverted).
> >
> >   [2/4]: t5310: test delta reuse with bitmaps
> >
> > I did this separately to give us flexibility to squash or merge
> > quickly. But it does find Ævar's bug on a git without patch 1.
> >
> >   [3/4]: traverse_bitmap_commit_list(): don't free result
> >
> > The original assert should have simply been useless, but it was the
> > surprising behavior of this function that turned it into a bug.
> >
> >   [4/4]: pack-bitmap: drop "loaded" flag
> >
> > And this is just an annoyance I ran into, which is a fallout from
> > our conversion to using an allocated bitmap_index struct.
> 
> FWIW, the whole series is
> Reviewed-by: Stefan Beller 
> I am sorry for suggesting the BUG() in the first place.

Heh. You only asked a question about the interface. I was the one who
was _supposed_ to be familiar with the code, and put in the actual
assertion. So if your suggestion was dumb, my response was doubly so. :)

-Peff


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-04 Thread Stefan Beller
>   [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check
>
> The actual fix. This should get merged to next ASAP (or the original
> topic just reverted).
>
>   [2/4]: t5310: test delta reuse with bitmaps
>
> I did this separately to give us flexibility to squash or merge
> quickly. But it does find Ævar's bug on a git without patch 1.
>
>   [3/4]: traverse_bitmap_commit_list(): don't free result
>
> The original assert should have simply been useless, but it was the
> surprising behavior of this function that turned it into a bug.
>
>   [4/4]: pack-bitmap: drop "loaded" flag
>
> And this is just an annoyance I ran into, which is a fallout from
> our conversion to using an allocated bitmap_index struct.

FWIW, the whole series is
Reviewed-by: Stefan Beller 
I am sorry for suggesting the BUG() in the first place.

Stefan


[PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-01 Thread Jeff King
On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote:

> On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > On Tue, Aug 21 2018, Jeff King wrote:
> > 
> > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
> > > +  const unsigned char *sha1)
> > > +{
> > > + int pos;
> > > +
> > > + if (!bitmap_git)
> > > + return 0; /* no bitmap loaded */
> > > + if (!bitmap_git->result)
> > > + BUG("failed to perform bitmap walk before querying");
> > 
> > Some part of what calls this completely breaks pushing from the "next"
> > branch when you have local bitmaps (we *really* should have some tests
> > for this...).
> 
> Yikes, thanks for reporting. I agree we need better tests here.

OK, here is the fix. Since the problem is in 'next', this is done as a
patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to
rewind 'next' post-release anyway, we could squash it directly into
30cdc33fba from the original series. That would help later bisections
from running into it, which may be worth it as it's a pretty severe
breakage.  Or maybe not:

  1. The test suite doesn't actually fail, because it's toy repos are
 too small.

  2. It only triggers in the real-world if you have bitmaps turned on,
 which are not the default.

So it may not be that likely in practice to bother a hypothetical future
bisecting developer.

> [1] Actually, there is also prepare_bitmap_git(), but it is not really
> for general use by callers. It should be made static, or better yet,
> I suspect it can be folded into its callers.

This actually turned out not to work. There's a caller over in
pack-bitmap-write.c, and it makes things worse to try to expand the
logic there. So it technically _is_ possible to have a bitmap_index
without a "have" field, but it also doesn't make sense to ask about
"uninteresting" objects there. You haven't done (and cannot do) a
traversal on such an object.

Which I think goes back to Stefan's original question: is this just a
crappy API? And the answer is "yes, to some degree". There are really
two uses of bitmaps:

  - you want to do a traverse_commit_list() walk, but faster

  - you want to selectively query the on-disk bitmaps (e.g., you are
walking for --contains and want to ask "do we have a bitmap for this
object?"

Those currently use the same struct bitmap_index, but with two different
constructors (prepare_bitmap_git and prepare_bitmap_walk). It probably
ought to be two different ones (with the "walk" variant using the
"query" variant under the hood). I've punted on that full conversion for
now, but did clean up a few confusing bits.

  [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check

The actual fix. This should get merged to next ASAP (or the original
topic just reverted).

  [2/4]: t5310: test delta reuse with bitmaps

I did this separately to give us flexibility to squash or merge
quickly. But it does find Ævar's bug on a git without patch 1.

  [3/4]: traverse_bitmap_commit_list(): don't free result

The original assert should have simply been useless, but it was the
surprising behavior of this function that turned it into a bug.

  [4/4]: pack-bitmap: drop "loaded" flag

And this is just an annoyance I ran into, which is a fallout from
our conversion to using an allocated bitmap_index struct.

 pack-bitmap.c   | 14 ++-
 pack-bitmap.h   |  2 +-
 t/t5310-pack-bitmaps.sh | 93 +
 3 files changed, 97 insertions(+), 12 deletions(-)

-Peff


Re: [PATCH 0/4] tests: make more use of 'test_must_be_empty'

2018-08-22 Thread Matthew DeVore
Note that you can also trivially convert a lot more instances of empty
checking to the "right" way: 'test_line_count = 0'

$ grep -Ir 'test_line_count = 0' t | wc -l
76

I think it would be nice to clean these up as well.



[PATCH 0/4] tests: make more use of 'test_must_be_empty'

2018-08-19 Thread SZEDER Gábor
This series is a continuation of and applies on top of
'ab/test-must-be-empty-for-master', and converts even more places to
use 'test_must_be_empty'.

There are still a bunch of cases in the form of 'test -z "$(cmd...)"'
that could use 'test_must_be_empty'...  maybe someday.

SZEDER Gábor (4):
  tests: use 'test_must_be_empty' instead of '! test -s'
  tests: use 'test_must_be_empty' instead of 'test ! -s'
  tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null '
  tests: use 'test_must_be_empty' instead of 'test_cmp  '

 t/t-basic.sh |  2 +-
 t/t0001-init.sh  |  5 +--
 t/t0003-attributes.sh|  3 +-
 t/t0030-stripspace.sh| 22 --
 t/t0040-parse-options.sh |  4 +-
 t/t0061-run-command.sh   |  9 ++--
 t/t0090-cache-tree.sh|  2 +-
 t/t0203-gettext-setlocale-sanity.sh  |  4 +-
 t/t1300-config.sh|  5 +--
 t/t1410-reflog.sh|  3 +-
 t/t1411-reflog-show.sh   |  3 +-
 t/t1450-fsck.sh  | 11 +++--
 t/t1501-work-tree.sh |  2 +-
 t/t1510-repo-setup.sh| 24 +--
 t/t1600-index.sh |  3 +-
 t/t1700-split-index.sh   |  4 +-
 t/t2013-checkout-submodule.sh|  6 +--
 t/t2200-add-update.sh|  3 +-
 t/t2203-add-intent.sh|  6 +--
 t/t2204-add-ignored.sh   |  8 ++--
 t/t3001-ls-files-others-exclude.sh   |  6 +--
 t/t3004-ls-files-basic.sh|  6 +--
 t/t3035-merge-sparse.sh  |  5 +--
 t/t3210-pack-refs.sh |  6 +--
 t/t3301-notes.sh |  4 +-
 t/t3308-notes-merge.sh   |  2 +-
 t/t3310-notes-merge-manual-resolve.sh|  8 ++--
 t/t3404-rebase-interactive.sh|  5 +--
 t/t3600-rm.sh|  6 +--
 t/t4011-diff-symlink.sh  |  2 +-
 t/t4015-diff-whitespace.sh   |  4 +-
 t/t4019-diff-wserror.sh  |  2 +-
 t/t4027-diff-submodule.sh| 29 ++---
 t/t4041-diff-submodule-option.sh | 18 
 t/t4047-diff-dirstat.sh  |  4 +-
 t/t4060-diff-submodule-option-diff-format.sh | 18 
 t/t4116-apply-reverse.sh |  2 +-
 t/t4124-apply-ws-rule.sh |  2 +-
 t/t4132-apply-removal.sh |  5 +--
 t/t4150-am.sh|  4 +-
 t/t4201-shortlog.sh  |  2 +-
 t/t4203-mailmap.sh   |  4 +-
 t/t4211-line-log.sh  |  2 +-
 t/t4212-log-corrupt.sh   |  6 +--
 t/t4300-merge-tree.sh| 34 +++
 t/t5304-prune.sh |  3 +-
 t/t5314-pack-cycle-detection.sh  |  3 +-
 t/t5401-update-hooks.sh  | 10 ++---
 t/t5500-fetch-pack.sh|  2 +-
 t/t5509-fetch-push-namespaces.sh |  2 +-
 t/t5523-push-upstream.sh |  2 +-
 t/t5526-fetch-submodules.sh  | 45 ++--
 t/t5541-http-push-smart.sh   |  2 +-
 t/t5570-git-daemon.sh|  2 +-
 t/t6112-rev-list-filters-objects.sh  |  3 +-
 t/t6120-describe.sh  |  3 +-
 t/t6130-pathspec-noglob.sh   |  9 ++--
 t/t6200-fmt-merge-msg.sh |  4 +-
 t/t7001-mv.sh|  6 +--
 t/t7004-tag.sh   |  8 +---
 t/t7008-grep-binary.sh   |  6 +--
 t/t7064-wtstatus-pv2.sh  |  5 +--
 t/t7201-co.sh|  7 ++-
 t/t7400-submodule-basic.sh   | 28 
 t/t7401-submodule-summary.sh |  2 +-
 t/t7406-submodule-update.sh  |  2 +-
 t/t7501-commit.sh|  6 +--
 t/t7600-merge.sh |  9 ++--
 t/t7610-mergetool.sh |  3 +-
 t/t7810-grep.sh  | 35 ++-
 t/t7811-grep-open.sh | 18 +++-
 t/t8010-cat-file-filters.sh  |  2 +-
 t/t9011-svn-da.sh| 14 +++---
 t/t9131-git-svn-empty-symlink.sh |  6 +--
 t/t9135-git-svn-moved-branch-empty-file.sh   |  3 +-
 t/t9200-git-cvsexportcommit.sh   |  4 +-
 t/t9802-git-p4-filetype.sh   |  2 +-
 t/t9903-bash-prompt.sh   |  6 +--
 78 files changed, 232 insertions(+), 345 deletions(-)

-- 
2.18.0.903.gab616d7dc6



[PATCH 0/4] finishing touches on jk/for-each-object-iteration

2018-08-14 Thread Jeff King
On Mon, Aug 13, 2018 at 11:45:06AM -0700, Jonathan Tan wrote:

> >   [1/7]: for_each_*_object: store flag definitions in a single location
> >   [2/7]: for_each_*_object: take flag arguments as enum
> >   [3/7]: for_each_*_object: give more comprehensive docstrings
> >   [4/7]: for_each_packed_object: support iterating in pack-order
> >   [5/7]: t1006: test cat-file --batch-all-objects with duplicates
> >   [6/7]: cat-file: rename batch_{loose,packed}_object callbacks
> >   [7/7]: cat-file: support "unordered" output for --batch-all-objects
> 
> Thanks for laying all the patches out so cleanly! All of them are:
> Reviewed-by: Jonathan Tan 
> 
> Normally I would re-explain the patches to demonstrate that I understand
> them, but in this case, I think they are simple enough - patches 1, 2,
> 3, and 6 are refactorings that I agree with, patch 5 just makes a test
> more comprehensive, and patches 4 and 7 do what their commit messages
> say.
> 
> Stefan brought up the concern that cache.h is increasing in size, but I
> agree with the patch as written that it's probably best that we
> centralize all the flags somewhere, and we can deal with the location in
> a future patch.

Thanks for the review. Here are a few patches on top to deal with the
cache.h thing, as well as some optimizations that came out of discussing
oidset in another thread (I left out for now the "big" optimization of
moving oidset to a different data structure; that's complicated enough
to be dealt with on its own, I think).

The first patch here could arguably be squashed into the final patch of
the original series, but I'm OK with it either way.

  [1/4]: cat-file: use oidset check-and-insert
  [2/4]: cat-file: split batch "buf" into two variables
  [3/4]: cat-file: use a single strbuf for all output
  [4/4]: for_each_*_object: move declarations to object-store.h

 builtin/cat-file.c | 43 +++-
 builtin/prune-packed.c |  1 +
 cache.h| 75 ---
 object-store.h | 90 ++
 packfile.h | 20 --
 5 files changed, 116 insertions(+), 113 deletions(-)

-Peff


[PATCH 0/4] Better colors in range-diff!

2018-08-10 Thread Stefan Beller
This improves colors of the range-diff, see last patch for details.

This is a partial resend of
https://public-inbox.org/git/20180804015317.182683-1-sbel...@google.com/
and is also available via

  git fetch https://github.com/stefanbeller/git sb/range-diff-better-colors

It applies on the (just reset) series of sb/range-diff-colors.

Thanks,
Stefan

Stefan Beller (4):
  diff.c: emit_line_0 to take string instead of first sign
  diff.c: add --output-indicator-{new, old, context}
  range-diff: make use of different output indicators
  range-diff: indent special lines as context

 diff.c| 43 +++
 diff.h|  5 +
 range-diff.c  | 17 -
 t/t3206-range-diff.sh | 12 ++--
 4 files changed, 58 insertions(+), 19 deletions(-)

-- 
2.18.0.865.gffc8e1a3cd6-goog



Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Schindelin
Hi Peff,

On Fri, 10 Aug 2018, Jeff King wrote:

> On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote:
> 
> > So unless you are willing to ignore, to willfully keep this breakage,
> > I would suggest not to introduce the ugliness of an overridden
> > upload-pack for the sole purpose of disabling the tracing on one side,
> > but instead to get this here bug fixed, by helping me with this here
> > patch series.
> 
> I'm OK if you want to live with the broken test in the interim.

I realize that I failed to tell you that I basically spent 2.5 days worth
of worktime to figure this out and come up with three iterations of the
patch series (you only saw the latest).

I want this patch series in git.git, maybe not in the current form, but in
one form or another. I don't want to spend any more minute on trying to
figure out the same problem with any other regression test (which might
not even be as easily worked around as with a semi-simple --upload-pack
option).

Thank you for wanting to help. Please accept my apologies for expressing
in a poor way that I appreciate your eagerness to provide a patch, but
that I think nevertheless that it would be better to work on the GIT_TRACE
concurrency problem and its resolution via file locks. It would solve that
class of problems, instead of that single regression test being flakey.

Sorry,
Dscho


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff King
On Fri, Aug 10, 2018 at 11:34:59AM -0700, Junio C Hamano wrote:

> Johannes Sixt  writes:
> 
> > As this buglet looks like a recurring theme, and a proper fix is
> > preferable over repeated work-arounds. To me it looks like we need
> > some sort of locking on Windows. Unless your friends at Microsoft have
> > an ace in their sleeves that lets us have atomic O_APPEND the POSIX
> > way...
> 
> Just to put the severity of the issue in context, we use O_APPEND in
> a few codepaths, and the trace thing for debugging is the only thing
> that could have multiple writers.  Other users of O_APPEND are:
> 
>  * refs/files-backend.c uses it so that a reflog entry can be
>appended at the end, but because update to each ref is protected
>from racing at a lot higher layer with a lock, no two people
>would try to append to the same reflog file, so atomicity of
>O_APPEND does not matter here.

Just an interesting side note, but I think one of the reasons we use
dot-locks and not flock() is that the former generally works on network
filesystems. I think O_APPEND also is not reliable for non-local files,
though, so short of actually dot-locking trace files (yuck) I don't
think there's a great solution there.

> It may make sense to allow GIT_TRACE to have a placeholder
> (e.g. "/tmp/traceout.$$") to help debuggers arrange to give
> different processes their own trace output file, which perhaps may
> be a simpler and less impactful to the performance solution than
> having to make locks at an application layer.

Heh, I actually wrote that patch several years ago, as part of an
abandoned effort to have config-triggered tracing (e.g., on the server
side where you're getting hundreds of requests and you want to enable
tracing on a repo for a slice of time).

One annoying thing for a case like the test in question is that you
don't actually know the pid of the process you're interested in. So we'd
generate two trace files, and then you'd have to figure out which one is
which. In my earlier work, I coupled it with some magic to allow
per-command config, like:

  [trace "fetch"]
  packet = /tmp/trace.%p

but you could also imagine a "%n" which uses the command name.

I don't remember why I abandoned it, but I think a lot had to do with
violating the "don't look at config" rules for our repo startup code. A
lot of that has been fixed since then, but I haven't really needed it.
If anybody is really interested, the patches are at:

  https://github.com/peff/git jk/trace-stdin

(my ultimate goal was to record stdin for pack-objects to replay slow
fetches, but I ended up hacking together a patch to pack-objects
instead).

-Peff


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Jeff King  writes:

> I have tried to help with the actual problem, by identifying the root
> cause (that the trace code's strategy is not fundamentally broken, but
> that it relies on O_APPEND whose guarantees are obviously not portable
> to Windows) and exploring possible solutions there (FILE_APPEND_DATA).
>
> Lacking any kind of Windows development environment, I'd just as soon
> stop there and let more knowledgeable people take over. But please don't
> characterize my presence in this thread as some kind of willful harm or
> ignoring of the problem.

Thanks.

It seems like the other Johannes and the other Jeff are also
interested, so we may hopefully see a properly fixed O_APPEND,
which would be the real solution that would help appenders
that are *not* tracing subsystem, too.





Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Johannes Sixt  writes:

> As this buglet looks like a recurring theme, and a proper fix is
> preferable over repeated work-arounds. To me it looks like we need
> some sort of locking on Windows. Unless your friends at Microsoft have
> an ace in their sleeves that lets us have atomic O_APPEND the POSIX
> way...

Just to put the severity of the issue in context, we use O_APPEND in
a few codepaths, and the trace thing for debugging is the only thing
that could have multiple writers.  Other users of O_APPEND are:

 * refs/files-backend.c uses it so that a reflog entry can be
   appended at the end, but because update to each ref is protected
   from racing at a lot higher layer with a lock, no two people
   would try to append to the same reflog file, so atomicity of
   O_APPEND does not matter here.

 * sequencer.c wants to use it when moving one insn from the todo
   list to the 'done' list when it finishes one operation.  If you
   are running two sequences in a single repository, intermixed
   'done' list would be the least of your problem, so presumably we
   are fine here.

It may make sense to allow GIT_TRACE to have a placeholder
(e.g. "/tmp/traceout.$$") to help debuggers arrange to give
different processes their own trace output file, which perhaps may
be a simpler and less impactful to the performance solution than
having to make locks at an application layer.




Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff King
On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote:

> > > Correct.  But I am more worried about the "mixed/overwriting"
> > > breakage, if there is one; it means we may need to be prepared for
> > > systems that lack O_APPEND that works correctly.  I initially just
> > > assumed that it was what Dscho was seeing, but after re-reading his
> > > message, I am not sure anymore.
> > > 
> > > I think the "do not trace the other side" approach you suggest for
> > > these tests that only care about one side is more appropriate
> > > solution for this particular case.  We then do not have to worry
> > > about overwriting or output from both sides mixed randomly.
> 
> Just so everybody knows: I never received the above two mails. Something
> is seriously rotten in GMX. This seems to have started a week or two ago.

I think it may just have been that they went to the gitgitgadget
address; I explicitly added you as a cc when I sent the patch with a
commit message, which was when I first noticed you weren't cc'd on the
earlier bits.

> What you cannot see easily, unless you go the route that I offered
> Jonathan (open a PR on gitgitgadget/git which will automatically run the
> test suite on Windows, macOS and Linux, and of course you can do anything
> in a PR, including narrowing down what tests are run) is that sometimes
> those lines in the `trace` file were clearly *incomplete*. That is, even
> if both processes tried to write atomically, one managed to overwrite the
> other's buffer in flight.
> 
> This is a pretty serious thing, even worse than the failing test suite,
> and I don't think that your patch acknowledges how serious this is.

I don't see anything in what you wrote above or in your original cover
letter or series that contradicts the notion that O_APPEND is simply not
atomic on Windows. Overwriting the same byte ranges in the file is what
I would expect on Linux, as well, if the file were opened without
O_APPEND.

I have mixed reports from searching online whether an equivalent to
O_APPEND exists. Some sources seem to say that FILE_APPEND_DATA is.  I
couldn't find anything definitive in Microsoft documentation about
whether it actually provides atomicity.  From my (admittedly limited)
digging, that does not seem to be used by the msys2 runtime, so it might
be worth exploring.

> And please don't give me that "but it works on Linux" response. Seriously.
> Sheesh.

But it does. And it's designed to. There is literally nothing to fix on
Linux.

> Even if you have not managed to trigger it, the POSIX standard seems not
> to define clearly what the behavior is of two competing write() calls,
> unless you lock the files appropriately, as my patch series does.

POSIX says:

  If the O_APPEND flag of the file status flags is set, the file offset
  shall be set to the end of the file prior to each write and no
  intervening file modification operation shall occur between changing
  the file offset and the write operation.

and:

  This volume of POSIX.1-2017 does not specify the behavior of
  concurrent writes to a regular file from multiple threads, except that
  each write is atomic.

I admit the writing is a little turgid, but I think between the two it
is promising the behavior that we expect (and I think that
interpretation is pretty common in the Unix world).

> So unless you are willing to ignore, to willfully keep this breakage, I
> would suggest not to introduce the ugliness of an overridden upload-pack
> for the sole purpose of disabling the tracing on one side, but instead to
> get this here bug fixed, by helping me with this here patch series.

I'm OK if you want to live with the broken test in the interim. But
after your complaining about the imminence of it "infecting" master, I
thought you might be happy to see a solution that was more immediate.

I have tried to help with the actual problem, by identifying the root
cause (that the trace code's strategy is not fundamentally broken, but
that it relies on O_APPEND whose guarantees are obviously not portable
to Windows) and exploring possible solutions there (FILE_APPEND_DATA).

Lacking any kind of Windows development environment, I'd just as soon
stop there and let more knowledgeable people take over. But please don't
characterize my presence in this thread as some kind of willful harm or
ignoring of the problem.

-Peff


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Sixt

Am 10.08.2018 um 18:51 schrieb Jeff Hostetler:



On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs 
a lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the 
same

file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I 
looked for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 



It is basically the same as Peff suggests: log only one side of the 
fetch.


As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need 
some sort of locking on Windows. Unless your friends at Microsoft have 
an ace in their sleeves that lets us have atomic O_APPEND the POSIX 
way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.


I looked at the CRT code, too, and no, we can't trust it.


I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)


As far as I understand, FILE_APPEND_DATA is just a permission flag (I'm 
not sure why that would be necessary), but at least the documentation 
states that it is still necessary for the caller to set the file pointer.


But I haven't tried your code below, yet. Should it append the line of 
text on each invocation? And if so, is it an atomic operation?



Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
 HANDLE h = CreateFileW(L"foo.txt",
     FILE_APPEND_DATA,
     FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
     NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

 const char * buf = "this is a test\n";
 DWORD len = strlen(buf);
 DWORD nbw;

 WriteFile(h, buf, len, , NULL);
 CloseHandle(h);

 return 0;
}





Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff Hostetler




On 8/10/2018 12:51 PM, Jeff Hostetler wrote:



On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs 
a lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the 
same

file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I 
looked for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 



It is basically the same as Peff suggests: log only one side of the 
fetch.


As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need 
some sort of locking on Windows. Unless your friends at Microsoft have 
an ace in their sleeves that lets us have atomic O_APPEND the POSIX 
way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.

I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)


d'oh

s/GENERIC_READ/GENERIC_WRITE/



Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
 HANDLE h = CreateFileW(L"foo.txt",
     FILE_APPEND_DATA,
     FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
     NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

 const char * buf = "this is a test\n";
 DWORD len = strlen(buf);
 DWORD nbw;

 WriteFile(h, buf, len, , NULL);
 CloseHandle(h);

 return 0;
}


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff Hostetler




On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs a 
lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the same
file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I looked 
for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742

It is basically the same as Peff suggests: log only one side of the fetch.

As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need some 
sort of locking on Windows. Unless your friends at Microsoft have an ace 
in their sleeves that lets us have atomic O_APPEND the POSIX way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.

I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)

Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
HANDLE h = CreateFileW(L"foo.txt",
FILE_APPEND_DATA,
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

const char * buf = "this is a test\n";
DWORD len = strlen(buf);
DWORD nbw;

WriteFile(h, buf, len, , NULL);
CloseHandle(h);

return 0;
}


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Schindelin
Hi,

On Fri, 10 Aug 2018, Jeff King wrote:

> On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote:
> 
> > Jeff King  writes:
> > 
> > > Are you sure that it's not well-defined? We open the path with O_APPEND,
> > > which means every write() will be atomically positioned at the end of
> > > file. So we would never lose or overwrite data.
> > >
> > > We do our own buffering in a strbuf, writing the result out in a single
> > > write() call (modulo the OS returning a short write, but that should not
> > > generally happen when writing short strings to a file). So we should get
> > > individual trace lines as atomic units.
> > >
> > > The order of lines from the two processes is undefined, of course.
> > 
> > Correct.  But I am more worried about the "mixed/overwriting"
> > breakage, if there is one; it means we may need to be prepared for
> > systems that lack O_APPEND that works correctly.  I initially just
> > assumed that it was what Dscho was seeing, but after re-reading his
> > message, I am not sure anymore.
> > 
> > I think the "do not trace the other side" approach you suggest for
> > these tests that only care about one side is more appropriate
> > solution for this particular case.  We then do not have to worry
> > about overwriting or output from both sides mixed randomly.

Just so everybody knows: I never received the above two mails. Something
is seriously rotten in GMX. This seems to have started a week or two ago.

> Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
> us pursue any fix for interleaved trace output on Windows without the
> pressure of an impending flaky test.
> 
> My gut says that looking into making O_APPEND work there is going to be
> the nicest solution, but my gut is not very well versed in Windows
> subtleties. ;)

Your patch seems to be a patch in the original sense of the word: it is
not a solution.

And let me spell it out very clearly (as I have mentioned to Jonathan a
couple times, without any response): t5552 fails regularly, way more than
rarely, see for yourself:

https://git-for-windows.visualstudio.com/git/_build/index?definitionId=1&_a=history

And the reason it fails was analyzed by me and described in the commit
message (I am sorry that the cover letter was still stale and talked
about flock(), which I had decided was not the right approach after
fighting with Linux over it).

What you cannot see easily, unless you go the route that I offered
Jonathan (open a PR on gitgitgadget/git which will automatically run the
test suite on Windows, macOS and Linux, and of course you can do anything
in a PR, including narrowing down what tests are run) is that sometimes
those lines in the `trace` file were clearly *incomplete*. That is, even
if both processes tried to write atomically, one managed to overwrite the
other's buffer in flight.

This is a pretty serious thing, even worse than the failing test suite,
and I don't think that your patch acknowledges how serious this is.

And please don't give me that "but it works on Linux" response. Seriously.
Sheesh.

Even if you have not managed to trigger it, the POSIX standard seems not
to define clearly what the behavior is of two competing write() calls,
unless you lock the files appropriately, as my patch series does.

So unless you are willing to ignore, to willfully keep this breakage, I
would suggest not to introduce the ugliness of an overridden upload-pack
for the sole purpose of disabling the tracing on one side, but instead to
get this here bug fixed, by helping me with this here patch series.

We don't have to rush this, you know? We have to fix it, though.

Thank you,
Dscho

> -- >8 --
> Subject: [PATCH] t5552: suppress upload-pack trace output
> 
> The t5552 test script uses GIT_TRACE_PACKET to monitor what
> git-fetch sends and receives. However, because we're
> accessing a local repository, the child upload-pack also
> sends trace output to the same file.
> 
> On Linux, this works out OK. We open the trace file with
> O_APPEND, so all writes are atomically positioned at the end
> of the file. No data can be overwritten or omitted. And
> since we prepare our small writes in a strbuf and write them
> with a single write(), we should see each line as an atomic
> unit. The order of lines between the two processes is
> undefined, but the test script greps only for "fetch>" or
> "fetch<" lines. So under Linux, the test results are
> deterministic.
> 
> The test fails intermittently on Windows, however,
> reportedly even overwriting bits of the output file (i.e.,
> O_APPEND does not seem to give us an atomic position+write).
> 
> Since the test only cares about the trace output from fetch,
> we can just disable the output from upload-pack. That
> doesn't solve the greater question of O_APPEND/trace issues
> under Windows, but it easily fixes the flakiness from this
> test.
> 
> Reported-by: Johannes Schindelin 
> Signed-off-by: Jeff King 
> ---
> I'm assuming that this really isn't 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Sixt

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:

I reported a couple of times that t5552 is not passing reliably. It has now
reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs a lot
more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the same
file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I looked 
for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742

It is basically the same as Peff suggests: log only one side of the fetch.

As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need some 
sort of locking on Windows. Unless your friends at Microsoft have an ace 
in their sleeves that lets us have atomic O_APPEND the POSIX way...


-- Hannes


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Jeff King  writes:

> Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
> us pursue any fix for interleaved trace output on Windows without the
> pressure of an impending flaky test.
>
> My gut says that looking into making O_APPEND work there is going to be
> the nicest solution, but my gut is not very well versed in Windows
> subtleties. ;)

As we know these tests are only interested in "fetch" lines and not
record of communication in the other direction, I think this is a
nice clean-up that is worth doing.

For only two grep's it may not look worth doing, but it would be
also a good clean-up to give an got_acked helper that sits next to
have_sent and have_not_sent helpers for readability.  That is of
course outside the scope of this change.

Will queue.  Thanks.

> -- >8 --
> Subject: [PATCH] t5552: suppress upload-pack trace output
>
> The t5552 test script uses GIT_TRACE_PACKET to monitor what
> git-fetch sends and receives. However, because we're
> accessing a local repository, the child upload-pack also
> sends trace output to the same file.
>
> On Linux, this works out OK. We open the trace file with
> O_APPEND, so all writes are atomically positioned at the end
> of the file. No data can be overwritten or omitted. And
> since we prepare our small writes in a strbuf and write them
> with a single write(), we should see each line as an atomic
> unit. The order of lines between the two processes is
> undefined, but the test script greps only for "fetch>" or
> "fetch<" lines. So under Linux, the test results are
> deterministic.
>
> The test fails intermittently on Windows, however,
> reportedly even overwriting bits of the output file (i.e.,
> O_APPEND does not seem to give us an atomic position+write).
>
> Since the test only cares about the trace output from fetch,
> we can just disable the output from upload-pack. That
> doesn't solve the greater question of O_APPEND/trace issues
> under Windows, but it easily fixes the flakiness from this
> test.
>
> Reported-by: Johannes Schindelin 
> Signed-off-by: Jeff King 
> ---
> I'm assuming that this really isn't triggerable on Linux. I tried and
> couldn't manage to get it to fail, and the reasoning above explains why.
> But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows.
>
>  t/t5552-skipping-fetch-negotiator.sh | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/t/t5552-skipping-fetch-negotiator.sh 
> b/t/t5552-skipping-fetch-negotiator.sh
> index 0a8e0e42ed..0ad50dd839 100755
> --- a/t/t5552-skipping-fetch-negotiator.sh
> +++ b/t/t5552-skipping-fetch-negotiator.sh
> @@ -28,6 +28,19 @@ have_not_sent () {
>   done
>  }
>  
> +# trace_fetch   [args]
> +#
> +# Trace the packet output of fetch, but make sure we disable the variable
> +# in the child upload-pack, so we don't combine the results in the same file.
> +trace_fetch () {
> + client=$1; shift
> + server=$1; shift
> + GIT_TRACE_PACKET="$(pwd)/trace" \
> + git -C "$client" fetch \
> +   --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
> +   "$server" "$@"
> +}
> +
>  test_expect_success 'commits with no parents are sent regardless of skip 
> distance' '
>   git init server &&
>   test_commit -C server to_fetch &&
> @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
> regardless of skip distanc
>   # "c1" has no parent, it is still sent as "have" even though it would
>   # normally be skipped.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c7 c5 c2 c1 &&
>   have_not_sent c6 c4 c3
>  '
> @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the 
> larger one' '
>   # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
>   # (from "c5side" skip 1).
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c5side c11 c9 c6 c1 &&
>   have_not_sent c10 c8 c7 c5 c4 c3 c2
>  '
> @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out 
> commits' '
>   # not need to send any ancestors of "c3", but we still need to send "c3"
>   # itself.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
> + trace_fetch client origin to_fetch &&
>   have_sent c5 c4^ c2side &&
>   have_not_sent c4 c4^^ c4^^^
>  '
> @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' '
>   # and sent, because (due to clock skew) its only parent has already been
>   # popped off the priority queue.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Jeff King  writes:

> Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
> us pursue any fix for interleaved trace output on Windows without the
> pressure of an impending flaky test.
>
> My gut says that looking into making O_APPEND work there is going to be
> the nicest solution, but my gut is not very well versed in Windows
> subtleties. ;)

As we know these tests are only interested in "fetch" lines and not
record of communication in the other direction, I think this is a
nice clean-up that is worth doing.

For only two grep's it may not look worth doing, but it would be
also a good clean-up to give an got_acked helper that sits next to
have_sent and have_not_sent helpers for readability.  That is of
course outside the scope of this change.

Will queue.  Thansk.

> -- >8 --
> Subject: [PATCH] t5552: suppress upload-pack trace output
>
> The t5552 test script uses GIT_TRACE_PACKET to monitor what
> git-fetch sends and receives. However, because we're
> accessing a local repository, the child upload-pack also
> sends trace output to the same file.
>
> On Linux, this works out OK. We open the trace file with
> O_APPEND, so all writes are atomically positioned at the end
> of the file. No data can be overwritten or omitted. And
> since we prepare our small writes in a strbuf and write them
> with a single write(), we should see each line as an atomic
> unit. The order of lines between the two processes is
> undefined, but the test script greps only for "fetch>" or
> "fetch<" lines. So under Linux, the test results are
> deterministic.
>
> The test fails intermittently on Windows, however,
> reportedly even overwriting bits of the output file (i.e.,
> O_APPEND does not seem to give us an atomic position+write).
>
> Since the test only cares about the trace output from fetch,
> we can just disable the output from upload-pack. That
> doesn't solve the greater question of O_APPEND/trace issues
> under Windows, but it easily fixes the flakiness from this
> test.
>
> Reported-by: Johannes Schindelin 
> Signed-off-by: Jeff King 
> ---
> I'm assuming that this really isn't triggerable on Linux. I tried and
> couldn't manage to get it to fail, and the reasoning above explains why.
> But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows.
>
>  t/t5552-skipping-fetch-negotiator.sh | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/t/t5552-skipping-fetch-negotiator.sh 
> b/t/t5552-skipping-fetch-negotiator.sh
> index 0a8e0e42ed..0ad50dd839 100755
> --- a/t/t5552-skipping-fetch-negotiator.sh
> +++ b/t/t5552-skipping-fetch-negotiator.sh
> @@ -28,6 +28,19 @@ have_not_sent () {
>   done
>  }
>  
> +# trace_fetch   [args]
> +#
> +# Trace the packet output of fetch, but make sure we disable the variable
> +# in the child upload-pack, so we don't combine the results in the same file.
> +trace_fetch () {
> + client=$1; shift
> + server=$1; shift
> + GIT_TRACE_PACKET="$(pwd)/trace" \
> + git -C "$client" fetch \
> +   --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
> +   "$server" "$@"
> +}
> +
>  test_expect_success 'commits with no parents are sent regardless of skip 
> distance' '
>   git init server &&
>   test_commit -C server to_fetch &&
> @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
> regardless of skip distanc
>   # "c1" has no parent, it is still sent as "have" even though it would
>   # normally be skipped.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c7 c5 c2 c1 &&
>   have_not_sent c6 c4 c3
>  '
> @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the 
> larger one' '
>   # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
>   # (from "c5side" skip 1).
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c5side c11 c9 c6 c1 &&
>   have_not_sent c10 c8 c7 c5 c4 c3 c2
>  '
> @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out 
> commits' '
>   # not need to send any ancestors of "c3", but we still need to send "c3"
>   # itself.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
> + trace_fetch client origin to_fetch &&
>   have_sent c5 c4^ c2side &&
>   have_not_sent c4 c4^^ c4^^^
>  '
> @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' '
>   # and sent, because (due to clock skew) its only parent has already been
>   # popped off the priority queue.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff King
On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Are you sure that it's not well-defined? We open the path with O_APPEND,
> > which means every write() will be atomically positioned at the end of
> > file. So we would never lose or overwrite data.
> >
> > We do our own buffering in a strbuf, writing the result out in a single
> > write() call (modulo the OS returning a short write, but that should not
> > generally happen when writing short strings to a file). So we should get
> > individual trace lines as atomic units.
> >
> > The order of lines from the two processes is undefined, of course.
> 
> Correct.  But I am more worried about the "mixed/overwriting"
> breakage, if there is one; it means we may need to be prepared for
> systems that lack O_APPEND that works correctly.  I initially just
> assumed that it was what Dscho was seeing, but after re-reading his
> message, I am not sure anymore.
> 
> I think the "do not trace the other side" approach you suggest for
> these tests that only care about one side is more appropriate
> solution for this particular case.  We then do not have to worry
> about overwriting or output from both sides mixed randomly.

Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
us pursue any fix for interleaved trace output on Windows without the
pressure of an impending flaky test.

My gut says that looking into making O_APPEND work there is going to be
the nicest solution, but my gut is not very well versed in Windows
subtleties. ;)

-- >8 --
Subject: [PATCH] t5552: suppress upload-pack trace output

The t5552 test script uses GIT_TRACE_PACKET to monitor what
git-fetch sends and receives. However, because we're
accessing a local repository, the child upload-pack also
sends trace output to the same file.

On Linux, this works out OK. We open the trace file with
O_APPEND, so all writes are atomically positioned at the end
of the file. No data can be overwritten or omitted. And
since we prepare our small writes in a strbuf and write them
with a single write(), we should see each line as an atomic
unit. The order of lines between the two processes is
undefined, but the test script greps only for "fetch>" or
"fetch<" lines. So under Linux, the test results are
deterministic.

The test fails intermittently on Windows, however,
reportedly even overwriting bits of the output file (i.e.,
O_APPEND does not seem to give us an atomic position+write).

Since the test only cares about the trace output from fetch,
we can just disable the output from upload-pack. That
doesn't solve the greater question of O_APPEND/trace issues
under Windows, but it easily fixes the flakiness from this
test.

Reported-by: Johannes Schindelin 
Signed-off-by: Jeff King 
---
I'm assuming that this really isn't triggerable on Linux. I tried and
couldn't manage to get it to fail, and the reasoning above explains why.
But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows.

 t/t5552-skipping-fetch-negotiator.sh | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/t/t5552-skipping-fetch-negotiator.sh 
b/t/t5552-skipping-fetch-negotiator.sh
index 0a8e0e42ed..0ad50dd839 100755
--- a/t/t5552-skipping-fetch-negotiator.sh
+++ b/t/t5552-skipping-fetch-negotiator.sh
@@ -28,6 +28,19 @@ have_not_sent () {
done
 }
 
+# trace_fetch   [args]
+#
+# Trace the packet output of fetch, but make sure we disable the variable
+# in the child upload-pack, so we don't combine the results in the same file.
+trace_fetch () {
+   client=$1; shift
+   server=$1; shift
+   GIT_TRACE_PACKET="$(pwd)/trace" \
+   git -C "$client" fetch \
+ --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
+ "$server" "$@"
+}
+
 test_expect_success 'commits with no parents are sent regardless of skip 
distance' '
git init server &&
test_commit -C server to_fetch &&
@@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
regardless of skip distanc
# "c1" has no parent, it is still sent as "have" even though it would
# normally be skipped.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c7 c5 c2 c1 &&
have_not_sent c6 c4 c3
 '
@@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the larger 
one' '
# the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
# (from "c5side" skip 1).
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c5side c11 c9 c6 c1 &&
have_not_sent c10 c8 c7 c5 c4 c3 c2
 '
@@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Jeff King
On Thu, Aug 09, 2018 at 02:08:56PM -0700, Junio C Hamano wrote:

> > Correct.  But I am more worried about the "mixed/overwriting"
> > breakage, if there is one; it means we may need to be prepared for
> > systems that lack O_APPEND that works correctly.  I initially just
> > assumed that it was what Dscho was seeing, but after re-reading his
> > message, I am not sure anymore.
> >
> > I think the "do not trace the other side" approach you suggest for
> > these tests that only care about one side is more appropriate
> > solution for this particular case.  We then do not have to worry
> > about overwriting or output from both sides mixed randomly.
> 
> A concluding sentence I forgot to add, after saying "this is simpler
> and better to fix test breakage", was
> 
>   But if we really are seeing O_APPEND breakage, a mandatory
>   locking mechanism like this one might be necessary to work
>   around it (I seriously hope we do not have to, though).
> 
> Sorry for an additional noise.

Yeah, my earlier email said "if we just care about this test, then...".

But if we do care about making this multiple-tracing case work all the
time, then we'd need some solution. If there's something that can work
like O_APPEND, even if we have to make some system-specific call, I'd
prefer that to locking.

I did some searching for atomic append on Windows and got mixed reports
on whether their FILE_DATA_APPEND does what we want or not. Knowing
nothing about msys, I'd _assume_ that O_APPEND would get translated to
that flag, but I got lost trying to find it in
git-for-windows/msys2-runtime.

Wouldn't it be wonderful if the solution were as simple as making sure
that flag was plumbed through? I seriously doubt it will be that easy,
though. :)

-Peff


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> Are you sure that it's not well-defined? We open the path with O_APPEND,
>> which means every write() will be atomically positioned at the end of
>> file. So we would never lose or overwrite data.
>>
>> We do our own buffering in a strbuf, writing the result out in a single
>> write() call (modulo the OS returning a short write, but that should not
>> generally happen when writing short strings to a file). So we should get
>> individual trace lines as atomic units.
>>
>> The order of lines from the two processes is undefined, of course.
>
> Correct.  But I am more worried about the "mixed/overwriting"
> breakage, if there is one; it means we may need to be prepared for
> systems that lack O_APPEND that works correctly.  I initially just
> assumed that it was what Dscho was seeing, but after re-reading his
> message, I am not sure anymore.
>
> I think the "do not trace the other side" approach you suggest for
> these tests that only care about one side is more appropriate
> solution for this particular case.  We then do not have to worry
> about overwriting or output from both sides mixed randomly.

A concluding sentence I forgot to add, after saying "this is simpler
and better to fix test breakage", was

But if we really are seeing O_APPEND breakage, a mandatory
locking mechanism like this one might be necessary to work
around it (I seriously hope we do not have to, though).

Sorry for an additional noise.


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Junio C Hamano
Jeff King  writes:

> Are you sure that it's not well-defined? We open the path with O_APPEND,
> which means every write() will be atomically positioned at the end of
> file. So we would never lose or overwrite data.
>
> We do our own buffering in a strbuf, writing the result out in a single
> write() call (modulo the OS returning a short write, but that should not
> generally happen when writing short strings to a file). So we should get
> individual trace lines as atomic units.
>
> The order of lines from the two processes is undefined, of course.

Correct.  But I am more worried about the "mixed/overwriting"
breakage, if there is one; it means we may need to be prepared for
systems that lack O_APPEND that works correctly.  I initially just
assumed that it was what Dscho was seeing, but after re-reading his
message, I am not sure anymore.

I think the "do not trace the other side" approach you suggest for
these tests that only care about one side is more appropriate
solution for this particular case.  We then do not have to worry
about overwriting or output from both sides mixed randomly.

> diff --git a/t/t5552-skipping-fetch-negotiator.sh 
> b/t/t5552-skipping-fetch-negotiator.sh
> index 3b60bd44e0..5ad5bece55 100755
> --- a/t/t5552-skipping-fetch-negotiator.sh
> +++ b/t/t5552-skipping-fetch-negotiator.sh
> @@ -28,6 +28,19 @@ have_not_sent () {
>   done
>  }
>  
> +# trace_fetch   [args]
> +#
> +# Trace the packet output of fetch, but make sure we disable the variable
> +# in the child upload-pack, so we don't combine the results in the same file.
> +trace_fetch () {
> + client=$1; shift
> + server=$1; shift
> + GIT_TRACE_PACKET="$(pwd)/trace" \
> + git -C "$client" fetch \
> +   --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
> +   "$server" "$@"
> +}
> +
>  test_expect_success 'commits with no parents are sent regardless of skip 
> distance' '
>   git init server &&
>   test_commit -C server to_fetch &&
> @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
> regardless of skip distanc
>   # "c1" has no parent, it is still sent as "have" even though it would
>   # normally be skipped.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c7 c5 c2 c1 &&
>   have_not_sent c6 c4 c3
>  '
> @@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the 
> larger one' '
>   # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
>   # (from "c5side" skip 1).
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c5side c11 c9 c6 c1 &&
>   have_not_sent c10 c8 c7 c5 c4 c3 c2
>  '
> @@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out 
> commits' '
>   # not need to send any ancestors of "c3", but we still need to send "c3"
>   # itself.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
> + trace_fetch client origin to_fetch &&
>   have_sent c5 c4^ c2side &&
>   have_not_sent c4 c4^^ c4^^^
>  '
> @@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' '
>   # and sent, because (due to clock skew) its only parent has already been
>   # popped off the priority queue.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c2 c1 old4 old2 old1 &&
>   have_not_sent old3
>  '
> @@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of 
> commits that server AC
>   test_commit -C server commit-on-b1 &&
>  
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" 
> to_fetch &&
> + trace_fetch client "$(pwd)/server" to_fetch &&
>   grep "  fetch" trace &&
>  
>   # fetch-pack sends 2 requests each containing 16 "have" lines before


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Jeff King
On Thu, Aug 09, 2018 at 10:35:24AM -0700, Johannes Schindelin via GitGitGadget 
wrote:

> The culprit is that two processes try simultaneously to write to the same
> file specified via GIT_TRACE_PACKET, and it is not well defined how that
> should work, even on Linux.

Are you sure that it's not well-defined? We open the path with O_APPEND,
which means every write() will be atomically positioned at the end of
file. So we would never lose or overwrite data.

We do our own buffering in a strbuf, writing the result out in a single
write() call (modulo the OS returning a short write, but that should not
generally happen when writing short strings to a file). So we should get
individual trace lines as atomic units.

The order of lines from the two processes is undefined, of course.

> This patch series addresses that by locking the trace fd. I chose to use 
> flock() instead of fcntl() because the Win32 API LockFileEx()
> [https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-lockfileex]
>  
> (which does exactly what I want in this context) has much more similar
> semantics to the former than the latter.

I must admit I'm not excited to see us adding extra locking and
complexity when it is not necessarily needed. Is this a feature we
actually care about delivering for normal users of GIT_TRACE, or do we
just care about solving the test flakiness here?

Because we should be able to do the latter with the patch below.

I also wonder if we should be clearing GIT_TRACE as part of clearing
repo-env. It would do what we want automatically in this case, though
there are definitely cases where I prefer the current behavior (because
I really do want to see the upload-pack side of it). Of course I could
always use "--upload-pack 'GIT_TRACE_PACKET=... upload-pack'", so this
is really just a default.

diff --git a/t/t5552-skipping-fetch-negotiator.sh 
b/t/t5552-skipping-fetch-negotiator.sh
index 3b60bd44e0..5ad5bece55 100755
--- a/t/t5552-skipping-fetch-negotiator.sh
+++ b/t/t5552-skipping-fetch-negotiator.sh
@@ -28,6 +28,19 @@ have_not_sent () {
done
 }
 
+# trace_fetch   [args]
+#
+# Trace the packet output of fetch, but make sure we disable the variable
+# in the child upload-pack, so we don't combine the results in the same file.
+trace_fetch () {
+   client=$1; shift
+   server=$1; shift
+   GIT_TRACE_PACKET="$(pwd)/trace" \
+   git -C "$client" fetch \
+ --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
+ "$server" "$@"
+}
+
 test_expect_success 'commits with no parents are sent regardless of skip 
distance' '
git init server &&
test_commit -C server to_fetch &&
@@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
regardless of skip distanc
# "c1" has no parent, it is still sent as "have" even though it would
# normally be skipped.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c7 c5 c2 c1 &&
have_not_sent c6 c4 c3
 '
@@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the 
larger one' '
# the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
# (from "c5side" skip 1).
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c5side c11 c9 c6 c1 &&
have_not_sent c10 c8 c7 c5 c4 c3 c2
 '
@@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out 
commits' '
# not need to send any ancestors of "c3", but we still need to send "c3"
# itself.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
+   trace_fetch client origin to_fetch &&
have_sent c5 c4^ c2side &&
have_not_sent c4 c4^^ c4^^^
 '
@@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' '
# and sent, because (due to clock skew) its only parent has already been
# popped off the priority queue.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c2 c1 old4 old2 old1 &&
have_not_sent old3
 '
@@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
test_commit -C server commit-on-b1 &&
 
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" 
to_fetch &&
+   trace_fetch client "$(pwd)/server" to_fetch &&
grep "  fetch" trace &&
 
# fetch-pack sends 2 requests each containing 16 "have" 

[PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Johannes Schindelin via GitGitGadget
I reported a couple of times that t5552 is not passing reliably. It has now
reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs a lot 
more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the same
file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.

This patch series addresses that by locking the trace fd. I chose to use 
flock() instead of fcntl() because the Win32 API LockFileEx()
[https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-lockfileex]
 
(which does exactly what I want in this context) has much more similar
semantics to the former than the latter.

Of course, I have to admit that I am not super solid on flock() semantics,
and I also do not know which conditional blocks in config.mak.uname should
grow a HAVE_FLOCK = YesWeDo line, still. Reviewers knowledgeable in flock() 
semantics: I would be much indebted if you helped me there. Also: is it safe
to call flock() on file descriptors referring not to files, but, say, pipes
or an interactive terminal?

Johannes Schindelin (4):
  Introduce a function to lock/unlock file descriptors when appending
  mingw: implement lock_or_unlock_fd_for_appending()
  trace: lock the trace file to avoid racy trace_write() calls
  trace: verify that locking works

 Makefile   |   1 +
 compat/mingw.c |  19 ++
 compat/mingw.h |   3 +
 config.mak.uname   |   3 +
 git-compat-util.h  |   2 +
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/helper/test-trace.c  | 130 +
 t/t0070-fundamental.sh |   6 ++
 trace.c|  11 +++-
 wrapper.c  |  14 +
 11 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-trace.c


base-commit: 42cc7485a2ec49ecc440c921d2eb0cae4da80549
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-17%2Fdscho%2Ffetch-negotiator-skipping-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-17/dscho/fetch-negotiator-skipping-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/17
-- 
gitgitgadget


Re: [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges

2018-08-05 Thread Eric Sunshine
On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
 wrote:
> Now, I am fairly certain that the changes are correct, but given my track
> record with off-by-one bugs (and once even an off-by-two bug), I would
> really appreciate some thorough review of this code, in particular the
> second one that is the actual bug fix. I am specifically interested in
> reviews from people who know line-log.c pretty well and can tell me whether
> the src[i].end > target[j].end is correct, or whether it should actually
> have been a >= (I tried to wrap my head around this, but I would feel more
> comfortable if a domain expert would analyze this, whistling, and looking
> Eric's way).

Although I fixed a number of bugs in basic range-set manipulation code
(5 years ago), I have no experience with or knowledge of the code
actually dealing with range-set manipulation in relation to diffs, so
I'm not a domain expert by any means, and I'm _still_ trying to wrap
my head around that code.

That said, I left some comments on the individual patches suggesting a
simpler and more correct fix for the crash. (Despite that suggestion
for fixing the crash, I still can't say whether the actual
computations performed by the existing code are correct, since I still
don't understand that aspect of it.)


[PATCH 0/4] line-log: be more careful when adjusting multiple line ranges

2018-08-04 Thread Johannes Schindelin via GitGitGadget
I am a heavy user of git log -L  In fact, I use the feature where
multiple ranges can be specified extensively, via a not-exactly-trivial
shell script function that takes the currently staged changes (or if none
are staged, the current unstanged changes) and turns them into the
corresponding commit history:

staged_log () { #
diff="$(git diff --cached -U1)"
test -n "$diff" ||
diff="$(git diff -U1)"
test -n "$diff" ||
die "No changes"
eval "git log $(echo "$diff" |
sed -ne '/^--- a\//{s/^-* a\/\(.*\)/'\''\1'\''/;x}' -e \
'/^@@ -/{s/^@@ -\([^, ]*\),\([^ ]*\).*/-L 
\1,+\2/;G;s/\n/:/g;p}' |
tr '\n' ' ')"
}

This is an extremely useful way to look at the history, especially when
trying to fix up a commit deep in a long branch (or a thicket of branches).

Today, however, this method failed me, by greeting me with an assertion.
When I tried to paper over that assertion by joining line ranges that became
adjacent (or overlapping), it still produced a segmentation fault when the
line-log tried to print lines past the file contents.

So I had no choice but to fix this properly.

I still wanted to keep the optimization where multiple line ranges are
joined into a single one (I am convinced that this also affects the output,
where previously multiple hunks would have been displayed, but I ran out of
time to investigate this). This is the 3rd patch. It is not purely an
optimization, as the assertion would still trigger when line ranges could be
joined.

Now, I am fairly certain that the changes are correct, but given my track
record with off-by-one bugs (and once even an off-by-two bug), I would
really appreciate some thorough review of this code, in particular the
second one that is the actual bug fix. I am specifically interested in
reviews from people who know line-log.c pretty well and can tell me whether
the src[i].end > target[j].end is correct, or whether it should actually
have been a >= (I tried to wrap my head around this, but I would feel more
comfortable if a domain expert would analyze this, whistling, and looking
Eric's way).

Cc: Eric Sunshine sunsh...@sunshineco.com [sunsh...@sunshineco.com]

Johannes Schindelin (4):
  line-log: demonstrate a bug with nearly-overlapping ranges
  line-log: adjust start/end of ranges individually
  line-log: optimize ranges by joining them when possible
  line-log: convert an assertion to a full BUG() call

 line-log.c  | 18 +++---
 t/t4211-line-log.sh | 17 +
 2 files changed, 32 insertions(+), 3 deletions(-)


base-commit: 1d89318c48d233d52f1db230cf622935ac3c69fa
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-15%2Fdscho%2Fline-log-fix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-15/dscho/line-log-fix-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/15
-- 
gitgitgadget


Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-18 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Olga,
>
> On Fri, 13 Jul 2018, Оля Тележная wrote:
>
>> 2018-07-09 11:27 GMT+03:00 Оля Тележная :
>> > Hello everyone,
>> > This is my new attempt to start using oid_object_info_extended() in
>> > ref-filter. You could look at previous one [1] [2] but it is not
>> > necessary.
>> >
>> > The goal (still) is to improve performance by avoiding calling expensive
>> > functions when we don't need the information they provide
>> > or when we could get it by using a cheaper function.
>> >
>> > This patch is a middle step. In the end, I want to add new atoms
>> > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in
>> > cat-file command.
>> >
>> > I also know about problems with memory leaks in ref-filter: that would
>> > be my next task that I will work on. Since I did not generate any new
>> > leaks in this patch (just use existing ones), I decided to put this
>> > part on a review and fix leaks as a separate task.
>> 
>> UPDATES since v1:
>> add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin)
>> improve second commit message (thanks to Junio C Hamano)
>> add static keyword (thanks to Ramsay Jones)
>> 
>> >
>> > Thank you!
>> >
>> > [1] https://github.com/git/git/pull/493
>
> Could you please populate the description of that PR so that SubmitGit
> picks it up as cover letter?

Thanks for suggesting that.  Yes, an updated version of a series,
even if it is a small one with just 4 or 5 patches, becomes much
easier to read with a well-written cover letter.


Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-18 Thread Johannes Schindelin
Hi Olga,

On Fri, 13 Jul 2018, Оля Тележная wrote:

> 2018-07-09 11:27 GMT+03:00 Оля Тележная :
> > Hello everyone,
> > This is my new attempt to start using oid_object_info_extended() in
> > ref-filter. You could look at previous one [1] [2] but it is not
> > necessary.
> >
> > The goal (still) is to improve performance by avoiding calling expensive
> > functions when we don't need the information they provide
> > or when we could get it by using a cheaper function.
> >
> > This patch is a middle step. In the end, I want to add new atoms
> > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in
> > cat-file command.
> >
> > I also know about problems with memory leaks in ref-filter: that would
> > be my next task that I will work on. Since I did not generate any new
> > leaks in this patch (just use existing ones), I decided to put this
> > part on a review and fix leaks as a separate task.
> 
> UPDATES since v1:
> add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin)
> improve second commit message (thanks to Junio C Hamano)
> add static keyword (thanks to Ramsay Jones)
> 
> >
> > Thank you!
> >
> > [1] https://github.com/git/git/pull/493

Could you please populate the description of that PR so that SubmitGit
picks it up as cover letter?

Thanks,
Johannes

> > [2] 
> > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/
> 

Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-13 Thread Оля Тележная
2018-07-09 11:27 GMT+03:00 Оля Тележная :
> Hello everyone,
> This is my new attempt to start using oid_object_info_extended() in
> ref-filter. You could look at previous one [1] [2] but it is not
> necessary.
>
> The goal (still) is to improve performance by avoiding calling expensive
> functions when we don't need the information they provide
> or when we could get it by using a cheaper function.
>
> This patch is a middle step. In the end, I want to add new atoms
> ("objectsize:disk" and "deltabase") and reuse ref-filter logic in
> cat-file command.
>
> I also know about problems with memory leaks in ref-filter: that would
> be my next task that I will work on. Since I did not generate any new
> leaks in this patch (just use existing ones), I decided to put this
> part on a review and fix leaks as a separate task.

UPDATES since v1:
add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin)
improve second commit message (thanks to Junio C Hamano)
add static keyword (thanks to Ramsay Jones)

>
> Thank you!
>
> [1] https://github.com/git/git/pull/493
> [2] 
> https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/


[PATCH 0/4] test-lint: detect one-shot "FOO=bar shell_func"

2018-07-12 Thread Eric Sunshine
One-shot "FOO=bar cmd" environment variable assignments exist only
during invocation of 'cmd'. However, if 'cmd' is a shell function, then
the variable is assigned in the running shell and exists until the
process exits (or is unset explicitly). Such a side-effect is almost
certainly unintended by a test author and is likely due to lack of
familiarity with the problem.

Upgrade "make test-lint" to detect this sort of suspect usage.

Also fix a couple instances of "FOO=bar shell_func" detected by the
improved linting.

This series is built atop 'jc/t3404-one-shot-export-fix'[1].

[1]: https://public-inbox.org/git/xmqqefg8w73c@gitster-ct.c.googlers.com/T/

Eric Sunshine (4):
  t6046/t9833: fix use of "VAR=VAL cmd" with a shell function
  t/check-non-portable-shell: stop being so polite
  t/check-non-portable-shell: make error messages more compact
  t/check-non-portable-shell: detect "FOO=bar shell_func"

 t/check-non-portable-shell.pl  | 31 +-
 t/t6046-merge-skip-unneeded-updates.sh |  4 +++-
 t/t9833-errors.sh  |  4 +++-
 3 files changed, 32 insertions(+), 7 deletions(-)

-- 
2.18.0.233.g985f88cf7e


Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-10 Thread Johannes Schindelin
Hi Olga,

On Mon, 9 Jul 2018, Оля Тележная wrote:

> [2] 
> https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/

This type of Message-Id makes me think that you used SubmitGit to send
this patch series.

The main problem I see here is that the patches are not sent as replies to
this cover letter, and therefore they are seemingly disconnected on the
mailing list.

It was also my impression that SubmitGit started supporting sending cover
letters, in which case you would not have to jump through hoops to thread
the mails properly. But for that to work, the PR has to have a description
which is then used as cover letter. I do not see any description in
https://github.com/git/git/pull/520, though. Maybe provide one?

Ciao,
Johannes

P.S.: You might have noticed that I am working (slowly, but steadily) on a
contender for SubmitGit that I call GitGitGadget. Originally, I really
wanted to enhance SubmitGit instead because I am a big believer of *not*
reinventing the wheel (so much energy gets wasted that way).

However, in this case the limitations of the chosen language (I do not
want to learn Scala, I have absolutely zero need to know Scala in any of
my other endeavors, and my time to learn new things is limited, so I spend
it wisely) and the limitations of the design (the UI is completely
separate from GitHub, you have to allow Amazon to send mails in your name,
and SubmitGit's design makes it impossible to work bi-directionally, it is
only GitHub -> mailing list, while I also want the option to add replies
on the mailing list as comments to the GitHub PR in the future) made me
reconsider.

If you want to kick the tires, so to say, I welcome you to give
GitGitGadget a try. It would require only a couple of things from you:

- You would have to settle for a branch name, and then not open new PRs
  for every iteration you want to send, but force-push the branch instead.

- You would have to open a PR at https://github.com/gitgitgadget/git.

- You would have to provide the cover letter via the PR's description (and
  update that description before sending newer iterations).

- I would have to add you to the list of users allowed to send patches via
  GitGitGadget (GitGitGadget has some really light-weight access control
  to prevent spamming).

- You would then send a new iteration by simply adding a comment to your
  PR that contains this command: /submit

- To integrate well with previous patch series iterations (i.e. to connect
  the threads), I would have to come up with a little bit of tooling to
  add some metadata that I have to reconstruct manually from your
  previously-sent iterations.

Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-09 Thread Junio C Hamano
Оля Тележная   writes:

> Hello everyone,
> This is my new attempt to start using oid_object_info_extended() in
> ref-filter. You could look at previous one [1] [2] but it is not
> necessary.

Yup, it sounds like a sensible thing to do to try asking object-info
helper instead of reading the whole object in-core and inspecting it
ourselves when we can avoid it.



[PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-09 Thread Оля Тележная
Hello everyone,
This is my new attempt to start using oid_object_info_extended() in
ref-filter. You could look at previous one [1] [2] but it is not
necessary.

The goal (still) is to improve performance by avoiding calling expensive
functions when we don't need the information they provide
or when we could get it by using a cheaper function.

This patch is a middle step. In the end, I want to add new atoms
("objectsize:disk" and "deltabase") and reuse ref-filter logic in
cat-file command.

I also know about problems with memory leaks in ref-filter: that would
be my next task that I will work on. Since I did not generate any new
leaks in this patch (just use existing ones), I decided to put this
part on a review and fix leaks as a separate task.

Thank you!

[1] https://github.com/git/git/pull/493
[2] 
https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/


Re: [RFC PATCH 0/4] Fix occasional test failures in http tests

2018-06-14 Thread Jeff King
On Thu, Jun 14, 2018 at 02:31:03PM +0200, SZEDER Gábor wrote:

> 't5561-http-backend.sh' is prone to occasional failures; luckily it's
> not 'git-http-backend's fault, but the test script is a bit racy.  I
> won't go into the details here, patch 4/4's commit message discusses
> it at length.  4/4 also fixes this issue, but I'm not particularly
> happy with that fix, the note attached to that patch explains why,
> along with possible alternatives, hence the RFC.
> 
> Even if we settle for a different fix, I think the first two patches
> are worthy cleanups on their own right.

Yes, the first two look like improvements on their own (and I left more
detailed comments on the final one). Thanks for tackling this!

-Peff


Re: [RFC PATCH 0/4] Fix occasional test failures in http tests

2018-06-14 Thread Junio C Hamano
SZEDER Gábor  writes:

> 't5561-http-backend.sh' is prone to occasional failures; luckily it's
> not 'git-http-backend's fault, but the test script is a bit racy.  I
> won't go into the details here, patch 4/4's commit message discusses
> it at length.  4/4 also fixes this issue, but I'm not particularly
> happy with that fix, the note attached to that patch explains why,
> along with possible alternatives, hence the RFC.

Interesting.

> Even if we settle for a different fix, I think the first two patches
> are worthy cleanups on their own right.

Yeah, the first two are unquestionally good.  

The sed expression of the second one could use sq around (instead of
dq) to lose hard-to-read backslash quoting from there, but it
probably is OK to keep the second one a pure "refactoring" patch,
and leave it to a separate "further cleanup" patch to make such a
change.

> SZEDER Gábor (4):
>   t5541: avoid empty line when truncating access log
>   t/lib-httpd: add the strip_access_log() helper function
>   t/lib-httpd: add minor and patch version to $HTTPD_VERSION
>   t/lib-httpd: sort log based on timestamp to avoid occasional failure
>
>  t/lib-httpd.sh  | 24 ++--
>  t/lib-httpd/apache.conf |  5 +
>  t/t5541-http-push-smart.sh  | 17 +++--
>  t/t5551-http-fetch-smart.sh |  7 +--
>  t/t5561-http-backend.sh |  7 +--
>  5 files changed, 32 insertions(+), 28 deletions(-)


[RFC PATCH 0/4] Fix occasional test failures in http tests

2018-06-14 Thread SZEDER Gábor
't5561-http-backend.sh' is prone to occasional failures; luckily it's
not 'git-http-backend's fault, but the test script is a bit racy.  I
won't go into the details here, patch 4/4's commit message discusses
it at length.  4/4 also fixes this issue, but I'm not particularly
happy with that fix, the note attached to that patch explains why,
along with possible alternatives, hence the RFC.

Even if we settle for a different fix, I think the first two patches
are worthy cleanups on their own right.

SZEDER Gábor (4):
  t5541: avoid empty line when truncating access log
  t/lib-httpd: add the strip_access_log() helper function
  t/lib-httpd: add minor and patch version to $HTTPD_VERSION
  t/lib-httpd: sort log based on timestamp to avoid occasional failure

 t/lib-httpd.sh  | 24 ++--
 t/lib-httpd/apache.conf |  5 +
 t/t5541-http-push-smart.sh  | 17 +++--
 t/t5551-http-fetch-smart.sh |  7 +--
 t/t5561-http-backend.sh |  7 +--
 5 files changed, 32 insertions(+), 28 deletions(-)

-- 
2.18.0.rc0.207.ga6211da864



[PATCH 0/4] color.ui docs & add color.ui=isatty

2018-05-30 Thread Ævar Arnfjörð Bjarmason
I started writing this for the reasons explained in 4/4, but the
justification there is a bit of a stretch. Where we'd realistically
like to diverge color.ui=auto and color.pager=false, but since I wrote
it already maybe some people will come out of the woodworks telling me
that this is what they've always wanted for whatever reason.

Maybe if we don't like 4/4 we should take 3/4 because we might want
another such option in the future, but that's probably overly hedging
our bets, still. I really don't like this pattern that we have a
multi-value config option that dies instead of warns on unknown future
values, so I'm leaning towards saying that should be accepted to git.

But while I was working towards 4/4 I did some nice fixes in [12]/4. I
think those should go in regardless, so they're non-RFC.

Ævar Arnfjörð Bjarmason (4):
  config doc: move color.ui documentation to one place
  config doc: clarify "to a terminal" in color.ui
  color.ui config: don't die on unknown values
  color.ui config: add "isatty" setting

 Documentation/config.txt | 100 +--
 color.c  |  25 --
 color.h  |   1 +
 t/t7006-pager.sh |  31 
 4 files changed, 107 insertions(+), 50 deletions(-)

-- 
2.17.0.290.gded63e768a



Re: [PATCH 0/4] fsck: doc fixes & fetch.fsck.* implementation

2018-05-28 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> On Thu, May 24, 2018 at 9:02 PM, Jeff King  wrote:
>> On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> That doesn't work, because that's for the server-side, but I need the
>>> fetch.fsck.* that doesn't exist. This works (I'll send a better patch
>>> with tests / docs etc. soon):
>>
>> Yeah, I think this is the right direction.

It seems that this 4-patch series is sufficiently commented and
earlier parts can be further polished soonish using help made by
others.  I need to re-read the last patch (implementation) but I
think this generally is good.

Thanks.


[PATCH 0/4] fsck: doc fixes & fetch.fsck.* implementation

2018-05-24 Thread Ævar Arnfjörð Bjarmason
On Thu, May 24, 2018 at 9:02 PM, Jeff King  wrote:
> On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> That doesn't work, because that's for the server-side, but I need the
>> fetch.fsck.* that doesn't exist. This works (I'll send a better patch
>> with tests / docs etc. soon):
>
> Yeah, I think this is the right direction.
>
>> +static int fetch_pack_config_cb(const char *var, const char *value, void 
>> *cb)
>> +{
>> +     if (strcmp(var, "fetch.fsck.skiplist") == 0) {
>> +             const char *path;
>> +
>> +             if (git_config_pathname(, var, value))
>> +                     return 1;
>> +             strbuf_addf(_msg_types, "%cskiplist=%s",
>> +                     fsck_msg_types.len ? ',' : '=', path);
>> +             free((char *)path);
>> +             return 0;
>> +     }
>> +
>> +     if (skip_prefix(var, "fetch.fsck.", )) {
>> +             if (is_valid_msg_type(var, value))
>> +                     strbuf_addf(_msg_types, "%c%s=%s",
>> +                             fsck_msg_types.len ? ',' : '=', var, value);
>> +             else
>> +                     warning("Skipping unknown msg id '%s'", var);
>> +             return 0;
>> +     }
>
> This matches what's in receive-pack, though the way we stuff all of the
> options into a flat string is kind of nasty. I also wonder if we'd
> eventually run up against command-line limits if somebody had a
> complicated fsck config.

Yeah, but I'm leaving this for the future. I doubt that it's going to
happen in practice, although if you have a huge skip-list...

> I wonder if we should simply be telling index-pack "please read fsck
> config from the prefix 'fetch.fsck'" instead.

I think this whole notion of reading the same config from two places
sucks, and now with my patches it's three. But I can't think of a
reasonable way to make it better without even more complexity, and
maybe it's better to split it up anyway, i.e. the stuff you want to
accept is different that fsck and fetch.

> I dunno, maybe I am just creating work. Certainly I don't think it
> should be a blocker for adding fetch.fsck support. But if you want to
> think about it while you are here or write a patch, I wouldn't complain. :)

Sorry, too late. I already wrote all of this :)

Ævar Arnfjörð Bjarmason (4):
  config doc: don't describe *.fetchObjects twice
  config doc: unify the description of fsck.* and receive.fsck.*
  config doc: elaborate on what transfer.fsckObjects does
  fetch: implement fetch.fsck.*

 Documentation/config.txt| 113 
 fetch-pack.c|  32 -
 t/t5504-fetch-receive-strict.sh |  46 +
 3 files changed, 148 insertions(+), 43 deletions(-)

-- 
2.17.0.290.gded63e768a



Re: [GSoC][PATCH 0/4] rebase: split rebase -p from rebase -i

2018-05-22 Thread Stefan Beller
On Tue, May 22, 2018 at 6:31 AM, Alban Gruin  wrote:
> This splits the `rebase --preserve-merges` functionnality from
> git-rebase--interactive.sh. This is part of the effort to depreciate
> preserve-merges. The new script, git-rebase--preserve-merges.sh, should be 
> left
> to bitrot. All the dead code left by the duplication of
> git-rebase--interactive.sh is also removed.

... and I thought the original motivation was getting the rest of rebase
into a shape that rewriting it is easier, the potential bit rot of
--preserve-merges
is rather a side effect, but not the main goal.

I commented on patch 1, as I don't quite understand the changes,
the other patches look good to me.

Thanks,
Stefan


[GSoC][PATCH 0/4] rebase: split rebase -p from rebase -i

2018-05-22 Thread Alban Gruin
This splits the `rebase --preserve-merges` functionnality from
git-rebase--interactive.sh. This is part of the effort to depreciate
preserve-merges. The new script, git-rebase--preserve-merges.sh, should be left
to bitrot. All the dead code left by the duplication of
git-rebase--interactive.sh is also removed.

This patch series is based of js/sequencer-and-root-commits.

Alban Gruin (4):
  rebase: duplicate git-rebase--interactive.sh to
git-rebase--preserve-merges.sh
  rebase: strip unused code in git-rebase--preserve-merges.sh
  rebase: use the new git-rebase--preserve-merges.sh
  rebase: remove -p code from git-rebase--interactive.sh

 .gitignore |1 +
 Makefile   |2 +
 git-rebase--interactive.sh |  708 +---
 git-rebase--preserve-merges.sh | 1004 
 git-rebase.sh  |   32 +-
 5 files changed, 1040 insertions(+), 707 deletions(-)
 create mode 100644 git-rebase--preserve-merges.sh

-- 
2.16.1



Re: [PATCH 0/4] a few mark_parents_uninteresting cleanups

2018-05-14 Thread Derrick Stolee

On 5/11/2018 2:00 PM, Jeff King wrote:

This is a follow-up to the discussion from February:

   
https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/

There I theorized that some of these extra has_object_file() checks were
unnecessary. After poking around a bit, I've convinced myself that this
is the case, so here are some patches.

After Stolee's fix in ebbed3ba04 (revision.c: reduce object database
queries, 2018-02-24), I doubt this will provide any noticeable speedup.
IMHO the value is just in simplifying the logic.

The first two patches are the actual has_object_file simplifications.
The second two are an attempt to fix some head-scratching I had while
reading mark_parents_uninteresting(). I hope the result is easier to
follow, but I may have just shuffled one confusion for another. :)

   [1/4]: mark_tree_contents_uninteresting(): drop missing object check
   [2/4]: mark_parents_uninteresting(): drop missing object check
   [3/4]: mark_parents_uninteresting(): replace list with stack
   [4/4]: mark_parents_uninteresting(): avoid most allocation

  revision.c | 90 ++
  1 file changed, 50 insertions(+), 40 deletions(-)

-Peff


This series looks good to me. I found Patch 3 hard to read in the diff, 
so I just looked at the final result and the new arrangement is very 
clear about how it should behave.


Reviewed-by: Derrick Stolee 


[PATCH 0/4] a few mark_parents_uninteresting cleanups

2018-05-11 Thread Jeff King
This is a follow-up to the discussion from February:

  
https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/

There I theorized that some of these extra has_object_file() checks were
unnecessary. After poking around a bit, I've convinced myself that this
is the case, so here are some patches.

After Stolee's fix in ebbed3ba04 (revision.c: reduce object database
queries, 2018-02-24), I doubt this will provide any noticeable speedup.
IMHO the value is just in simplifying the logic.

The first two patches are the actual has_object_file simplifications.
The second two are an attempt to fix some head-scratching I had while
reading mark_parents_uninteresting(). I hope the result is easier to
follow, but I may have just shuffled one confusion for another. :)

  [1/4]: mark_tree_contents_uninteresting(): drop missing object check
  [2/4]: mark_parents_uninteresting(): drop missing object check
  [3/4]: mark_parents_uninteresting(): replace list with stack
  [4/4]: mark_parents_uninteresting(): avoid most allocation

 revision.c | 90 ++
 1 file changed, 50 insertions(+), 40 deletions(-)

-Peff


Re: [PATCH 0/4] doc: cleaning up instances of \--

2018-05-10 Thread Jeff King
On Wed, Apr 18, 2018 at 01:24:55PM +0900, Junio C Hamano wrote:

> Martin Ågren  writes:
> 
> > This is a patch series to convert \-- to -- in our documentation. The
> > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
> > --option, 2015-05-13) to fix some instances that have appeared since.
> > The other three patches deal with standalone "\--" which we can't
> > always turn into "--" since it can be rendered as an em dash.
> 
> All looked sensible.  As you mentioned in [2/4], "\--::" that is
> part of an enumulation appear in documentation for about a dozen
> commands after the series, but I do not think we can avoid it.
> 
> One thing that makes me wonder related to these patches is if a
> newer AsciiDoc we assume lets us do without {litdd} macro.  This
> series and our earlier effort like 1c262bb7 ("doc: convert \--option
> to --option", 2015-05-13) mentions that "\--word" is less pleasant
> on the eyes than "--word", but the ugliness "two{litdd}words" has
> over "two--words" is far worse than that, so...

I think many cases that use {litdd} would be better off using literal
backticks anyway (e.g., git-add.txt mentions the filename
`git-add--interactive.perl`).

There are certainly a few that can't, though (e.g., config.txt uses
linkgit:git-web{litdd}browse[1]).  I agree that "\--" is less ugly there
(and seems to work on my modern asciidoc). There's some history on the
litdd versus "\--" choice in 565e135a1e (Documentation: quote
double-dash for AsciiDoc, 2011-06-29). That in turn references the
2839478774 (Work around em-dash handling in newer AsciiDoc, 2010-08-23),
but I wouldn't be surprised if all of that is now obsolete with our
AsciiDoc 8+ requirement.

-Peff

PS Late review, I know, but the patches look good to me. :)


Re: [PATCH 0/4] subtree: move out of contrib

2018-05-01 Thread Ævar Arnfjörð Bjarmason

On Tue, May 01 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> I think at this point git-subtree is widely used enough to move out of
>> contrib/, maybe others disagree, but patches are always better for
>> discussion that patch-less ML posts.
>
> Sure, it is used widely enough.
>
> However, it flies in the face of so many GSoC efforts to introduce yet
> another one of those poorly portable Unix shell scripts, as central part
> of Git's code base.
>
> The script itself does look quite straight-forward to port to a builtin,
> so why not give it a try?

That's a valid point. I think it makes sense to leave that aside for
now, maybe the consensus is that subtree is fine in every way except
we'd like to have a policy not to introduce new shellscript built-ins.

Let's first just assume it's in C already and look at it in terms of its
functionality, to figure out if it's worth even getting to that point.

> If you are completely opposed to porting it to C, I will be completely
> opposed to moving it out of contrib/.

This series shows that we should split the concern about whether
something lives in contrib/ from whether it's built/installed by
default.

No matter if we decide that subtree should be a blessed default command
it makes sense to move it out of contrib, purely because as can be seen
from this series it'll replace >100 lines of hacks with 1 line in our
main Makefile.

We can then just e.g. add a flag to guard for it,
e.g. CONTRIB_SUBTREE=YesPlease.

But that's just an internal implementation detail of how we manage code
sitting in git.git.


Re: [PATCH 0/4] subtree: move out of contrib

2018-05-01 Thread Johannes Schindelin
Hi Ævar,

On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote:

> I think at this point git-subtree is widely used enough to move out of
> contrib/, maybe others disagree, but patches are always better for
> discussion that patch-less ML posts.

Sure, it is used widely enough.

However, it flies in the face of so many GSoC efforts to introduce yet
another one of those poorly portable Unix shell scripts, as central part
of Git's code base.

The script itself does look quite straight-forward to port to a builtin,
so why not give it a try?

If you are completely opposed to porting it to C, I will be completely
opposed to moving it out of contrib/.

If you need help with porting it, please come up with a task plan and I
can jump in, to help (but please do collaborate with me on this one, don't
leave all of the hard work to me).

Ciao,
Dscho

Re: [PATCH 0/4] subtree: move out of contrib

2018-05-01 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 11:50 AM, Ævar Arnfjörð Bjarmason
 wrote:
> I think at this point git-subtree is widely used enough to move out of
> contrib/, maybe others disagree, but patches are always better for
> discussion that patch-less ML posts.

After narrow/partial clone becomes real, it should be "easy" to
implement some sort of narrow checkout that would achieve the same
thing. But it took me forever with all other stuff to get back to
this.

If we remove it from contrib and there are people willing to
update/maintain it, should it be a separate repository then? The
willing people will have much more freedom to update it. And I don't
have to answer the questions about who will maintain this thing in
git.git. I don't like the idea of adding new (official) shell-based
commands either to be honest.
-- 
Duy


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Avery Pennarun
On Mon, Apr 30, 2018 at 6:21 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Pretty clear it's garbage data, unless we're to believe that the
> relative interest of submodules in the US, Germany and Sweden is 51, 64
> & 84, but 75, 100 and 0 for subtree.

Oh yeah, Swedish people hate git-subtree.  Nobody knows why.

Avery


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Ævar Arnfjörð Bjarmason

On Mon, Apr 30 2018, Avery Pennarun wrote:

> On Mon, Apr 30, 2018 at 5:38 PM, Stefan Beller  wrote:
> There's one exception, which is doing a one-time permanent merge of
> two projects into one.  That's a nice feature, but is probably used
> extremely rarely.

FWIW this is the only thing I've used it for. I do this occasionally and
used to do this manually with format-patch + "perl -pe" before or
similar when I needed to merge some repositories together, and then some
other times I was less stupid and manually started doing something
similar to what subtree is doing with a "move everything" commit just
before the merge of the two histories.

>> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule
>>
>> Not sure what to make of this data.
>
> Clearly people need a lot more help when using submodules than when
> using subtree :)

Pretty clear it's garbage data, unless we're to believe that the
relative interest of submodules in the US, Germany and Sweden is 51, 64
& 84, but 75, 100 and 0 for subtree.


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Stefan Beller
On Mon, Apr 30, 2018 at 2:53 PM, Avery Pennarun  wrote:

> For the best of both worlds, I've often thought that a good balance
> would be to use the same data structure that submodule uses, but to
> store all the code in a single git repo under different refs, which we
> might or might not download (or might or might not have different
> ACLs) under different circumstances.

There has been some experimentation with having a simpler ref
surface on the submodule side,
https://public-inbox.org/git/cover.1512168087.git.jonathanta...@google.com/

The way you describe the future of submodules, all we'd have to do
is to teach git-clone how to select the the "interesting" refs for your use
case. Any other command would assume all submodule data to be
in the main repository.

The difference to Jonathans proposal linked above, would be the
object store to be in the main repo and the refs to be prefixed
per submodule instead of "shadowed".

>  However, when some projects get
> really huge (lots of very big submodule dependencies), then repacking
> one-big-repo starts becoming unwieldy; in that situation git-subtree
> also fails completely.

Yes, but that is a general scaling problem of Git that could be tackled,
e.g. repack into multiple packs serially instead of putting everything
into one pack.

>> Submodules do not need to produce a synthetic project history
>> when splitting off again, as the history is genuine. This allows
>> for easier work with upstream.
>
> Splitting for easier work upstream is great, and there really ought to
> be an official version of 'git subtree split', which is good for all
> sorts of purposes.
>
> However, I suspect almost all uses of the split feature are a)
> splitting a subtree that you previously merged in, or b) splitting a
> subtree into a separate project that you want to maintain separately
> from now on.  Repeated splits in case (a) are only necessary because
> you're not using submodules, or in case (b) are only necessary because
> you didn't *switch* to submodules when it finally came time to split
> the projects.  (In both cases you probably didn't switch to submodules
> because you didn't like one of its tradeoffs, especially the need to
> track multiple repos when you fork.)

That makes sense.

>
> There's one exception, which is doing a one-time permanent merge of
> two projects into one.  That's a nice feature, but is probably used
> extremely rarely.  More often people get into a
> merge-split-merge-split cycle that would be better served by a
> slightly improved git-submodule.

This rare use case is how git-subtree came into existence in gits
contrib directory AFAICT,
https://kernel.googlesource.com/pub/scm/git/git/+/634392b26275fe5436c0ea131bc89b46476aa4ae
which is interesting to view in git-show, but I think defaults could
be tweaked there, as it currently shows me mostly a license file.

>> Conceptually Gerrit is doing
>>
>>   while true:
>> git submodule update --remote
>> if worktree is dirty:
>> git commit "update the submodules"
>>
>> just that Gerrit doesn't poll but does it event based.
>
> ...and it's super handy :)  The problem is it's fundamentally
> centralized: because gerrit can serialize merges into the submodule,
> it also knows exactly how to update the link in the supermodule.  If
> there was wild branching and merging (as there often is in git) and
> you had to resolve conflicts between two submodules, I don't think it
> would be obvious at all how to do it automatically when pushing a
> submodule.  (This also works quite badly with git subtree --squash.)

With the poll based solution I don't think you'd run into many more
problems than you would with Gerrits solution.

In a nearby thread, we were just discussing the submodule merging
strategies,
https://public-inbox.org/git/1524739599.20251.17.ca...@klsmartin.com/
which might seem confusing, but the implementation is actually easy
as we just fastforward-only in submodules.

>>
>> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule
>>
>> Not sure what to make of this data.
>
> Clearly people need a lot more help when using submodules than when
> using subtree :)

That could be true. :)

Thanks,
Stefan


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Avery Pennarun
On Mon, Apr 30, 2018 at 5:38 PM, Stefan Beller  wrote:
> On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarun  wrote:
> No objections from me either.
>
> Submodules seem to serve a slightly different purpose, though?

I think the purpose is actually the same - it's just the tradeoffs
that are difference.  Both sets of tradeoffs kind of suck.

> With Subtrees the superproject always contains all the code,
> even when you squash the subtree histroy when merging it in.
> In the submodule world, you may not have access to one of the
> submodules.

Right.  Personally I think it's a disadvantage of subtree that it
always contains all the code (what if some people don't want the code
for a particular build variant?).  However, it's a huge pain that
submodules *don't* contain all the code (what if I'm not online right
now, or the site supposedly containing the code goes offline, or I
want to make my own fork?).

For the best of both worlds, I've often thought that a good balance
would be to use the same data structure that submodule uses, but to
store all the code in a single git repo under different refs, which we
might or might not download (or might or might not have different
ACLs) under different circumstances.  However, when some projects get
really huge (lots of very big submodule dependencies), then repacking
one-big-repo starts becoming unwieldy; in that situation git-subtree
also fails completely.

> Submodules do not need to produce a synthetic project history
> when splitting off again, as the history is genuine. This allows
> for easier work with upstream.

Splitting for easier work upstream is great, and there really ought to
be an official version of 'git subtree split', which is good for all
sorts of purposes.

However, I suspect almost all uses of the split feature are a)
splitting a subtree that you previously merged in, or b) splitting a
subtree into a separate project that you want to maintain separately
from now on.  Repeated splits in case (a) are only necessary because
you're not using submodules, or in case (b) are only necessary because
you didn't *switch* to submodules when it finally came time to split
the projects.  (In both cases you probably didn't switch to submodules
because you didn't like one of its tradeoffs, especially the need to
track multiple repos when you fork.)

> Subtrees present you the whole history by default and the user
> needs to be explicit about not wanting to see history from the
> subtree, which is the opposite of submodules (though this
> may be planned in the future to switch).

It turns out that AFAIK, almost everyone prefers 'git subtree
--squash', which squashes into a single commit each time you merge,
much like git submodule does.  I doubt people would cry too much if
the full-history feature went away.

There's one exception, which is doing a one-time permanent merge of
two projects into one.  That's a nice feature, but is probably used
extremely rarely.  More often people get into a
merge-split-merge-split cycle that would be better served by a
slightly improved git-submodule.

>> The gerrit team (eg. Stefan Beller) has been doing some really great
>> stuff to make submodules more usable by helping with relative
>> submodule links and by auto-updating links in supermodules at the
>> right times.  Unfortunately doing that requires help from the server
>> side, which kind of messes up decentralization and so doesn't solve
>> the problem in the general case.
>
> Conceptually Gerrit is doing
>
>   while true:
> git submodule update --remote
> if worktree is dirty:
> git commit "update the submodules"
>
> just that Gerrit doesn't poll but does it event based.

...and it's super handy :)  The problem is it's fundamentally
centralized: because gerrit can serialize merges into the submodule,
it also knows exactly how to update the link in the supermodule.  If
there was wild branching and merging (as there often is in git) and
you had to resolve conflicts between two submodules, I don't think it
would be obvious at all how to do it automatically when pushing a
submodule.  (This also works quite badly with git subtree --squash.)

>> I really wish there were a good answer, but I don't know what it is.
>> I do know that lots of people seem to at least be happy using
>> git-subtree, and would be even happier if it were installed
>> automatically with git.
>
> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule
>
> Not sure what to make of this data.

Clearly people need a lot more help when using submodules than when
using subtree :)

Have fun,

Avery


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Stefan Beller
On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarun  wrote:
> On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> I think at this point git-subtree is widely used enough to move out of
>> contrib/, maybe others disagree, but patches are always better for
>> discussion that patch-less ML posts.
>
> I really was hoping git-subtree would be completely obsoleted by
> git-submodule by now, but... it hasn't been, so... no objections on
> this end.

No objections from me either.

Submodules seem to serve a slightly different purpose, though?

With Subtrees the superproject always contains all the code,
even when you squash the subtree histroy when merging it in.
In the submodule world, you may not have access to one of the
submodules.

Submodules do not need to produce a synthetic project history
when splitting off again, as the history is genuine. This allows
for easier work with upstream.

Subtrees present you the whole history by default and the user
needs to be explicit about not wanting to see history from the
subtree, which is the opposite of submodules (though this
may be planned in the future to switch).

> The gerrit team (eg. Stefan Beller) has been doing some really great
> stuff to make submodules more usable by helping with relative
> submodule links and by auto-updating links in supermodules at the
> right times.  Unfortunately doing that requires help from the server
> side, which kind of messes up decentralization and so doesn't solve
> the problem in the general case.

Conceptually Gerrit is doing

  while true:
git submodule update --remote
if worktree is dirty:
git commit "update the submodules"

just that Gerrit doesn't poll but does it event based.

> I really wish there were a good answer, but I don't know what it is.
> I do know that lots of people seem to at least be happy using
> git-subtree, and would be even happier if it were installed
> automatically with git.

https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule

Not sure what to make of this data.


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Avery Pennarun
On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmason
 wrote:
> I think at this point git-subtree is widely used enough to move out of
> contrib/, maybe others disagree, but patches are always better for
> discussion that patch-less ML posts.

I really was hoping git-subtree would be completely obsoleted by
git-submodule by now, but... it hasn't been, so... no objections on
this end.

The gerrit team (eg. Stefan Beller) has been doing some really great
stuff to make submodules more usable by helping with relative
submodule links and by auto-updating links in supermodules at the
right times.  Unfortunately doing that requires help from the server
side, which kind of messes up decentralization and so doesn't solve
the problem in the general case.

I really wish there were a good answer, but I don't know what it is.
I do know that lots of people seem to at least be happy using
git-subtree, and would be even happier if it were installed
automatically with git.

Have fun,

Avery


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Philip Oakley

From: "Ævar Arnfjörð Bjarmason" 

I think at this point git-subtree is widely used enough to move out of
contrib/, maybe others disagree, but patches are always better for
discussion that patch-less ML posts.



Assuming this lands in Git, then there will also need to be a simple follow 
on into Duy's series that is updating the command-list.txt (Message-Id: 
<20180429181844.21325-10-pclo...@gmail.com>). Duy's series also does the 
completions thing IIUC;-).

--
Philip


Ævar Arnfjörð Bjarmason (4):
 git-subtree: move from contrib/subtree/
 subtree: remove support for git version <1.7
 subtree: fix a test failure under GETTEXT_POISON
 i18n: translate the git-subtree command

.gitignore|   1 +
Documentation/git-submodule.txt   |   2 +-
.../subtree => Documentation}/git-subtree.txt |   3 +
Makefile  |   1 +
contrib/subtree/.gitignore|   7 -
contrib/subtree/COPYING   | 339 --
contrib/subtree/INSTALL   |  28 --
contrib/subtree/Makefile  |  97 -
contrib/subtree/README|   8 -
contrib/subtree/t/Makefile|  86 -
contrib/subtree/todo  |  48 ---
.../subtree/git-subtree.sh => git-subtree.sh  | 109 +++---
{contrib/subtree/t => t}/t7900-subtree.sh |  21 +-
13 files changed, 78 insertions(+), 672 deletions(-)
rename {contrib/subtree => Documentation}/git-subtree.txt (99%)
delete mode 100644 contrib/subtree/.gitignore
delete mode 100644 contrib/subtree/COPYING
delete mode 100644 contrib/subtree/INSTALL
delete mode 100644 contrib/subtree/Makefile
delete mode 100644 contrib/subtree/README
delete mode 100644 contrib/subtree/t/Makefile
delete mode 100644 contrib/subtree/todo
rename contrib/subtree/git-subtree.sh => git-subtree.sh (84%)
rename {contrib/subtree/t => t}/t7900-subtree.sh (99%)

--
2.17.0.290.gded63e768a






[PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Ævar Arnfjörð Bjarmason
I think at this point git-subtree is widely used enough to move out of
contrib/, maybe others disagree, but patches are always better for
discussion that patch-less ML posts.

Ævar Arnfjörð Bjarmason (4):
  git-subtree: move from contrib/subtree/
  subtree: remove support for git version <1.7
  subtree: fix a test failure under GETTEXT_POISON
  i18n: translate the git-subtree command

 .gitignore|   1 +
 Documentation/git-submodule.txt   |   2 +-
 .../subtree => Documentation}/git-subtree.txt |   3 +
 Makefile  |   1 +
 contrib/subtree/.gitignore|   7 -
 contrib/subtree/COPYING   | 339 --
 contrib/subtree/INSTALL   |  28 --
 contrib/subtree/Makefile  |  97 -
 contrib/subtree/README|   8 -
 contrib/subtree/t/Makefile|  86 -
 contrib/subtree/todo  |  48 ---
 .../subtree/git-subtree.sh => git-subtree.sh  | 109 +++---
 {contrib/subtree/t => t}/t7900-subtree.sh |  21 +-
 13 files changed, 78 insertions(+), 672 deletions(-)
 rename {contrib/subtree => Documentation}/git-subtree.txt (99%)
 delete mode 100644 contrib/subtree/.gitignore
 delete mode 100644 contrib/subtree/COPYING
 delete mode 100644 contrib/subtree/INSTALL
 delete mode 100644 contrib/subtree/Makefile
 delete mode 100644 contrib/subtree/README
 delete mode 100644 contrib/subtree/t/Makefile
 delete mode 100644 contrib/subtree/todo
 rename contrib/subtree/git-subtree.sh => git-subtree.sh (84%)
 rename {contrib/subtree/t => t}/t7900-subtree.sh (99%)

-- 
2.17.0.290.gded63e768a



Re: [PATCH 0/4] doc: cleaning up instances of \--

2018-04-18 Thread brian m. carlson
On Tue, Apr 17, 2018 at 09:15:25PM +0200, Martin Ågren wrote:
> This is a patch series to convert \-- to -- in our documentation. The
> first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
> --option, 2015-05-13) to fix some instances that have appeared since.
> The other three patches deal with standalone "\--" which we can't
> always turn into "--" since it can be rendered as an em dash.

This series looked sane to me as well.  I appreciate the work to keep
our documentation working in both AsciiDoc and Asciidoctor.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 0/4] doc: cleaning up instances of \--

2018-04-17 Thread Junio C Hamano
Martin Ågren  writes:

> This is a patch series to convert \-- to -- in our documentation. The
> first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
> --option, 2015-05-13) to fix some instances that have appeared since.
> The other three patches deal with standalone "\--" which we can't
> always turn into "--" since it can be rendered as an em dash.

All looked sensible.  As you mentioned in [2/4], "\--::" that is
part of an enumulation appear in documentation for about a dozen
commands after the series, but I do not think we can avoid it.

One thing that makes me wonder related to these patches is if a
newer AsciiDoc we assume lets us do without {litdd} macro.  This
series and our earlier effort like 1c262bb7 ("doc: convert \--option
to --option", 2015-05-13) mentions that "\--word" is less pleasant
on the eyes than "--word", but the ugliness "two{litdd}words" has
over "two--words" is far worse than that, so...

Thanks, will queue.




[PATCH 0/4] doc: cleaning up instances of \--

2018-04-17 Thread Martin Ågren
This is a patch series to convert \-- to -- in our documentation. The
first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
--option, 2015-05-13) to fix some instances that have appeared since.
The other three patches deal with standalone "\--" which we can't
always turn into "--" since it can be rendered as an em dash.

Martin

Martin Ågren (4):
  doc: convert \--option to --option
  doc: convert [\--] to [--]
  git-[short]log.txt: unify quoted standalone --
  git-submodule.txt: quote usage in monospace, drop backslash

 Documentation/git-format-patch.txt | 2 +-
 Documentation/git-log.txt  | 8 
 Documentation/git-push.txt | 2 +-
 Documentation/git-shortlog.txt | 6 +++---
 Documentation/git-submodule.txt| 4 ++--
 Documentation/gitk.txt | 2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

-- 
2.17.0.252.gfe0a9eaf31



Re: [PATCH 0/4] Convert some stash functionality to a builtin

2018-03-25 Thread Thomas Gummerer
On 03/24, Joel Teichroeb wrote:
> I've been working on converting all of git stash to be a
> builtin, however it's hard to get it all working at once with
> limited time, so I've moved around half of it to a new
> stash--helper builtin and called these functions from the shell
> script. Once this is stabalized, it should be easier to convert
> the rest of the commands one at a time without breaking
> anything.
> 
> I've sent most of this code before, but that was targetting a
> full replacement of stash. The code is overall the same, but
> with some code review changes and updates for internal api
> changes.

Thanks for splitting this up into multiple patches, I found that much
more pleasant to review, and thanks for your continued work on this :)

> Since there seems to be interest from GSOC students who want to
> work on converting builtins, I figured I should finish what I
> have that works now so they could build on top of it.
> 
> Joel Teichroeb (4):
>   stash: convert apply to builtin
>   stash: convert branch to builtin
>   stash: convert drop and clear to builtin
>   stash: convert pop to builtin
> 
>  .gitignore  |   1 +
>  Makefile|   1 +
>  builtin.h   |   1 +
>  builtin/stash--helper.c | 514 
> 
>  git-stash.sh|  13 +-
>  git.c   |   1 +
>  6 files changed, 526 insertions(+), 5 deletions(-)
>  create mode 100644 builtin/stash--helper.c
> 
> -- 
> 2.16.2
> 


[PATCH 0/4] Convert some stash functionality to a builtin

2018-03-24 Thread Joel Teichroeb
I've been working on converting all of git stash to be a
builtin, however it's hard to get it all working at once with
limited time, so I've moved around half of it to a new
stash--helper builtin and called these functions from the shell
script. Once this is stabalized, it should be easier to convert
the rest of the commands one at a time without breaking
anything.

I've sent most of this code before, but that was targetting a
full replacement of stash. The code is overall the same, but
with some code review changes and updates for internal api
changes.

Since there seems to be interest from GSOC students who want to
work on converting builtins, I figured I should finish what I
have that works now so they could build on top of it.

Joel Teichroeb (4):
  stash: convert apply to builtin
  stash: convert branch to builtin
  stash: convert drop and clear to builtin
  stash: convert pop to builtin

 .gitignore  |   1 +
 Makefile|   1 +
 builtin.h   |   1 +
 builtin/stash--helper.c | 514 
 git-stash.sh|  13 +-
 git.c   |   1 +
 6 files changed, 526 insertions(+), 5 deletions(-)
 create mode 100644 builtin/stash--helper.c

-- 
2.16.2



Re: [PATCH 0/4] Speed up git tag --contains

2018-03-12 Thread Jeff King
On Mon, Mar 12, 2018 at 09:45:27AM -0400, Derrick Stolee wrote:

> > diff --git a/builtin/branch.c b/builtin/branch.c
> > index 8dcc2ed058..4d674e86d5 100644
> > --- a/builtin/branch.c
> > +++ b/builtin/branch.c
> > @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
> > struct ref_sorting *sortin
> > memset(, 0, sizeof(array));
> > +   filter->with_commit_tag_algo = 1;
> > filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
> > if (filter->verbose)
> > 
> > drops my run of "git branch -a --contains HEAD~100" from 8.6s to
> > 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
> > a smaller number of branches, we may actually end up slower (because we
> > dig further down in history, and don't benefit from the multiple-branch
> > speedup).
> 
> It's good to know that we already have an algorithm for the multi-head
> approach. Things like `git branch -vv` are harder to tease out because the
> graph walk is called by the line-format code.

Yeah, the ahead/behind stuff will need some work. Part of it is just
code structuring. We know ahead of time which branches (and their
upstreams) are going to need this ahead/behind computation, so we should
be able to do collect them all for a single call.

But I'm not sure if a general multi-pair ahead/behind is going to be
easy. I don't have even experimental code for that. :)

We have a multi-pair ahead/behind command which we use at GitHub, but it
does each pair separately. It leans heavily on reachability bitmaps, so
the main advantage is that it's able to amortize the cost of loading the
bitmaps (both off disk, but also we sometimes have to walk to complete
the bitmaps).

-Peff


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-12 Thread Derrick Stolee

On 3/3/2018 12:15 AM, Jeff King wrote:

On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote:


This is a resubmission of Jeff King's patch series to speed up git tag
--contains with some changes. It's been cooking for a while as:

Replying to this 6-year-old thread:

Is there any chance this could be resurrected?  We are using
phabricator, which uses `git branch --contains` as part of its
workflow.  Our repo has ~1000 branches on it, and the contains
operation is eating up all our CPU (and time).  It would be very
helpful to us to make this faster!

(The original thread is at
https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/

Sorry, this got thrown on my "to respond" pile and languished.


Thanks for adding me to the thread. It's good to know the pain point 
people are having around commit graph walks.



There are actually three things that make "git branch --contains" slow.

First, if you're filtering 1000 branches, we'll run 1000 merge-base
traversals, which may walk over the same commits multiple times.

These days "tag --contains" uses a different algorithm that can look at
all heads in a single traversal. But the downside is that it's
depth-first, so it tends to walk down to the roots. That's generally OK
for tags, since you often have ancient tags that mean getting close to
the roots anyway.

But for branches, they're more likely to be recent, and you can get away
without going very deep into the history.

So it's a tradeoff. There's no run-time switch to flip between them, but
a patch like this:

diff --git a/builtin/branch.c b/builtin/branch.c
index 8dcc2ed058..4d674e86d5 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
  
  	memset(, 0, sizeof(array));
  
+	filter->with_commit_tag_algo = 1;

filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
  
  	if (filter->verbose)


drops my run of "git branch -a --contains HEAD~100" from 8.6s to
0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
a smaller number of branches, we may actually end up slower (because we
dig further down in history, and don't benefit from the multiple-branch
speedup).


It's good to know that we already have an algorithm for the multi-head 
approach. Things like `git branch -vv` are harder to tease out because 
the graph walk is called by the line-format code.



I tried to do a "best of both" algorithm in:

  https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/

which finds arbitrary numbers of merge bases in a single traversal.  It
did seem to work, but I felt uneasy about some of the corner cases.
I've been meaning to revisit it, but obviously have never gotten around
to it.

The second slow thing is that during the traversal we load each commit
object from disk. The solution there is to keep the parent information
in a faster cache. I had a few proposals over the years, but I won't
even bother to dig them up, because there's quite recent and promising
work in this area from Derrick Stolee:

   
https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/

And finally, the thing that the patches you linked are referencing is
about using commit timestamps as a proxy for generation numbers. And
Stolee's patches actually leave room for real, trustable generation
numbers.

Once we have the serialized commit graph and generation numbers, think
the final step would just be to teach the "tag --contains" algorithm to
stop walking down unproductive lines of history. And in fact, I think we
can forget about the best-of-both multi-tip merge-base idea entirely.
Because if you can use the generation numbers to avoid going too deep,
then a depth-first approach is fine. And we'd just want to flip
git-branch over to using that algorithm by default.


I'll keep this in mind as a target for performance measurements in the 
serialized commit graph patch and the following generation number patch.


Thanks,
-Stolee


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-08 Thread csilvers
} I had a few proposals over the years, but I won't even bother to dig
} them up, because there's quite recent and promising work in this
} area from Derrick Stolee:

It sounds like the best thing to do is to wait for this, then.

We managed to convert a bunch of our branches to tags, so our
immediate problem has been resolved.  But I'm sure it will come up
again as more branches are created...

carig


Re: [PATCH 0/4] Teach `--default` to `git-config(1)`

2018-03-05 Thread Taylor Blau
On Mon, Mar 05, 2018 at 06:17:25PM -0800, Taylor Blau wrote:
> Attached is a patch series to introduce `--default` and `--color` to the
> `git-config(1)` builtin with the aim of introducing a consistent interface to
> replace `--get-color`.

This series draws significant inspiration from Soukaina Nait Hmid's work
in:

  
https://public-inbox.org/git/0102015fb0bf2f74-cb456171-fe65-4d83-8784-b553c7c9e584-000...@eu-west-1.amazonses.com/

--
- Taylor


[PATCH 0/4] Teach `--default` to `git-config(1)`

2018-03-05 Thread Taylor Blau
Hi,

Attached is a patch series to introduce `--default` and `--color` to the
`git-config(1)` builtin with the aim of introducing a consistent interface to
replace `--get-color`.

Thank you in advance for reviewing this series; I will be more than happy to
respond to any feedback seeing as I am still quite new to working on Git itself.

Thanks,
Taylor

Taylor Blau (4):
  builtin/config: introduce `--default`
  Documentation: list all type specifiers in config prose
  config.c: introduce 'git_config_color' to parse ANSI colors
  builtin/config: introduce `--color` type specifier

 Documentation/git-config.txt |  21 +---
 builtin/config.c |  36 +
 config.c |  10 
 config.h |   1 +
 t/t1300-repo-config.sh   |  16 ++
 t/t1310-config-default.sh| 119 +++
 6 files changed, 197 insertions(+), 6 deletions(-)
 create mode 100755 t/t1310-config-default.sh

--
2.15.1.354.g95ec6b1b3



Re: [PATCH 0/4] Speed up git tag --contains

2018-03-02 Thread Jeff King
On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote:

> > This is a resubmission of Jeff King's patch series to speed up git tag
> > --contains with some changes. It's been cooking for a while as:
> 
> Replying to this 6-year-old thread:
> 
> Is there any chance this could be resurrected?  We are using
> phabricator, which uses `git branch --contains` as part of its
> workflow.  Our repo has ~1000 branches on it, and the contains
> operation is eating up all our CPU (and time).  It would be very
> helpful to us to make this faster!
> 
> (The original thread is at
> https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/

Sorry, this got thrown on my "to respond" pile and languished.

There are actually three things that make "git branch --contains" slow.

First, if you're filtering 1000 branches, we'll run 1000 merge-base
traversals, which may walk over the same commits multiple times.

These days "tag --contains" uses a different algorithm that can look at
all heads in a single traversal. But the downside is that it's
depth-first, so it tends to walk down to the roots. That's generally OK
for tags, since you often have ancient tags that mean getting close to
the roots anyway.

But for branches, they're more likely to be recent, and you can get away
without going very deep into the history.

So it's a tradeoff. There's no run-time switch to flip between them, but
a patch like this:

diff --git a/builtin/branch.c b/builtin/branch.c
index 8dcc2ed058..4d674e86d5 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
 
memset(, 0, sizeof(array));
 
+   filter->with_commit_tag_algo = 1;
filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
 
if (filter->verbose)

drops my run of "git branch -a --contains HEAD~100" from 8.6s to
0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
a smaller number of branches, we may actually end up slower (because we
dig further down in history, and don't benefit from the multiple-branch
speedup).

I tried to do a "best of both" algorithm in:

 https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/

which finds arbitrary numbers of merge bases in a single traversal.  It
did seem to work, but I felt uneasy about some of the corner cases.
I've been meaning to revisit it, but obviously have never gotten around
to it.

The second slow thing is that during the traversal we load each commit
object from disk. The solution there is to keep the parent information
in a faster cache. I had a few proposals over the years, but I won't
even bother to dig them up, because there's quite recent and promising
work in this area from Derrick Stolee:

  
https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/

And finally, the thing that the patches you linked are referencing is
about using commit timestamps as a proxy for generation numbers. And
Stolee's patches actually leave room for real, trustable generation
numbers.

Once we have the serialized commit graph and generation numbers, think
the final step would just be to teach the "tag --contains" algorithm to
stop walking down unproductive lines of history. And in fact, I think we
can forget about the best-of-both multi-tip merge-base idea entirely.
Because if you can use the generation numbers to avoid going too deep,
then a depth-first approach is fine. And we'd just want to flip
git-branch over to using that algorithm by default.

-Peff


Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Duy Nguyen
On Tue, Feb 27, 2018 at 3:46 AM, Stefan Beller  wrote:
> What is the plan from here on?
> Should I build further series on top of yours? The next series will
> focus on the pack side of things (pack.h, packfile.{c,h})
>
> So maybe we'll have Junio merge down my series (and yours as it
> needs one reroll?) and then build on top of that?

I think that's the less painful way for everybody.
-- 
Duy


Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Stefan Beller
On Mon, Feb 26, 2018 at 2:30 AM, Nguyễn Thái Ngọc Duy  wrote:
> It turns out I don't need my other series [1] in order to delete this
> field. This series moves getenv() calls from
> repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in
> environment.c where they belong in my opinion.
>
> The repo_set_gitdir() now takes $GIT_DIR and optionally all other
> configurable paths. If those paths are NULL, default repo layout will
> be used. With getenv() no longer called inside repo_set_gitdir(),
> ignore_env has no reason to stay. This is in 1/4.
>
> The getenv() in prepare_alt_odb() is also moved back to
> setup_git_env() in 3/4. It demonstrates how we could move other
> getenv() back to if we want.
>
> This series is built on top of Stefan's object-store-part1, v4. I
> could rebase it on 'master' too, but then Junio may need to resolve
> some conflicts.
>

Thanks for working on this,
I found this series a pleasant read, the only issue I saw was Erics upbringing
of multiple getenv calls without strdup()ing the content.

What is the plan from here on?
Should I build further series on top of yours? The next series will
focus on the pack side of things (pack.h, packfile.{c,h})

So maybe we'll have Junio merge down my series (and yours as it
needs one reroll?) and then build on top of that?

Thanks,
Stefan


Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> This series is built on top of Stefan's object-store-part1, v4. I
> could rebase it on 'master' too, but then Junio may need to resolve
> some conflicts.

As a follow-up fix for combined "move things to the_repo as much as
possible" efforts I think what you did makes most sense.  I would
say that it makes even more sense to make these part of the
object-store series if/when the series needs rerolling--after all
the topic is already a multi-author effort.

I'll have to read them carefully before commenting on the actual
patches ;-)  Thanks.

>
> [1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/
>
> Nguyễn Thái Ngọc Duy (4):
>   repository.c: move env-related setup code back to environment.c
>   repository.c: delete dead functions
>   sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
>   repository: delete ignore_env member
>
>  cache.h|  2 +-
>  environment.c  | 13 +++--
>  object-store.h |  5 +++-
>  object.c   |  1 +
>  repository.c   | 79 ++
>  repository.h   | 21 +++---
>  setup.c|  3 +-
>  sha1_file.c|  6 +---
>  8 files changed, 64 insertions(+), 66 deletions(-)


[PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Nguyễn Thái Ngọc Duy
It turns out I don't need my other series [1] in order to delete this
field. This series moves getenv() calls from
repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in
environment.c where they belong in my opinion.

The repo_set_gitdir() now takes $GIT_DIR and optionally all other
configurable paths. If those paths are NULL, default repo layout will
be used. With getenv() no longer called inside repo_set_gitdir(),
ignore_env has no reason to stay. This is in 1/4.

The getenv() in prepare_alt_odb() is also moved back to
setup_git_env() in 3/4. It demonstrates how we could move other
getenv() back to if we want.

This series is built on top of Stefan's object-store-part1, v4. I
could rebase it on 'master' too, but then Junio may need to resolve
some conflicts.

[1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/

Nguyễn Thái Ngọc Duy (4):
  repository.c: move env-related setup code back to environment.c
  repository.c: delete dead functions
  sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
  repository: delete ignore_env member

 cache.h|  2 +-
 environment.c  | 13 +++--
 object-store.h |  5 +++-
 object.c   |  1 +
 repository.c   | 79 ++
 repository.h   | 21 +++---
 setup.c|  3 +-
 sha1_file.c|  6 +---
 8 files changed, 64 insertions(+), 66 deletions(-)

-- 
2.16.1.435.g8f24da2e1a



Re: [PATCH 0/4] Correct offsets of hunks when one is skipped

2018-02-19 Thread Phillip Wood
On 13/02/18 23:56, brian m. carlson wrote:
> On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote:
>> From: Phillip Wood 
>>
>> While working on a patch series to stage selected lines from a hunk
>> without having to edit it I got worried that subsequent patches would
>> be applied in the wrong place which lead to this series to correct the
>> offsets of hunks following those that are skipped or edited.
>>
>> Phillip Wood (4):
>>   add -i: add function to format hunk header
>>   t3701: add failing test for pathological context lines
>>   add -p: Adjust offsets of subsequent hunks when one is skipped
>>   add -p: calculate offset delta for edited patches
>>
>>  git-add--interactive.perl  | 93 
>> +++---
>>  t/t3701-add-interactive.sh | 30 +++
>>  2 files changed, 102 insertions(+), 21 deletions(-)
> 
> This looks reasonably sane to me.  I really like that you managed to
> produce failing tests for this situation.  I know pathological cases
> like this have bit GCC in the past, so it's good that you fixed this.
> 

Thanks Brain, it's interesting to hear that GCC has been bitten in the past

Best Wishes

Phillip


Re: [PATCH 0/4] Correct offsets of hunks when one is skipped

2018-02-13 Thread brian m. carlson
On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote:
> From: Phillip Wood 
> 
> While working on a patch series to stage selected lines from a hunk
> without having to edit it I got worried that subsequent patches would
> be applied in the wrong place which lead to this series to correct the
> offsets of hunks following those that are skipped or edited.
> 
> Phillip Wood (4):
>   add -i: add function to format hunk header
>   t3701: add failing test for pathological context lines
>   add -p: Adjust offsets of subsequent hunks when one is skipped
>   add -p: calculate offset delta for edited patches
> 
>  git-add--interactive.perl  | 93 
> +++---
>  t/t3701-add-interactive.sh | 30 +++
>  2 files changed, 102 insertions(+), 21 deletions(-)

This looks reasonably sane to me.  I really like that you managed to
produce failing tests for this situation.  I know pathological cases
like this have bit GCC in the past, so it's good that you fixed this.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH 0/4] Correct offsets of hunks when one is skipped

2018-02-13 Thread Phillip Wood
From: Phillip Wood 

While working on a patch series to stage selected lines from a hunk
without having to edit it I got worried that subsequent patches would
be applied in the wrong place which lead to this series to correct the
offsets of hunks following those that are skipped or edited.

Phillip Wood (4):
  add -i: add function to format hunk header
  t3701: add failing test for pathological context lines
  add -p: Adjust offsets of subsequent hunks when one is skipped
  add -p: calculate offset delta for edited patches

 git-add--interactive.perl  | 93 +++---
 t/t3701-add-interactive.sh | 30 +++
 2 files changed, 102 insertions(+), 21 deletions(-)

-- 
2.16.1



Re: [PATCH 0/4] Speed up git tag --contains

2018-01-12 Thread csilvers
> This is a resubmission of Jeff King's patch series to speed up git tag
> --contains with some changes. It's been cooking for a while as:

Replying to this 6-year-old thread:

Is there any chance this could be resurrected?  We are using
phabricator, which uses `git branch --contains` as part of its
workflow.  Our repo has ~1000 branches on it, and the contains
operation is eating up all our CPU (and time).  It would be very
helpful to us to make this faster!

(The original thread is at
https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/
)

craig


  1   2   3   4   5   6   >