Re: Retrieving a file in git that was deleted and committed

2018-12-06 Thread Bryan Turner
On Thu, Dec 6, 2018 at 11:26 PM Jeff King  wrote:
>
> On Thu, Dec 06, 2018 at 11:07:00PM -0800, biswaranjan panda wrote:
>
> > Thanks! Strangely git log --follow does work.
>
> I suspect it would work even without --follow. When you limit a log
> traversal with a pathspec, like:
>
>   git log foo
>
> that is not about following some continuous stream of content, but
> rather just applying that pathspec to the diff of each commit, and
> pruning ones where it did not change. So even if there are gaps where
> the file did not exist, we continue to apply the pathspec to the older
> commits.

Ah, of course. Thanks for the clarification, Jeff. And my apologies to
Biswaranjan Panda for the incorrect information.

>
> Tools like git-blame will _not_ work, though, as they really are trying
> to track the content as they walk back through history. And Once all of
> the content seems to appear from nowhere in your new commit, that seems
> like a dead end.
>
> In theory there could be some machine-readable annotation in the commit
> object (or in a note created after the fact) to say "even though 'foo'
> is a new file here, it came from $commit:foo".  And then git-blame could
> keep following the content there. But such a feature does not yet exist.
>
> -Peff


Re: Retrieving a file in git that was deleted and committed

2018-12-06 Thread Jeff King
On Thu, Dec 06, 2018 at 11:07:00PM -0800, biswaranjan panda wrote:

> Thanks! Strangely git log --follow does work.

I suspect it would work even without --follow. When you limit a log
traversal with a pathspec, like:

  git log foo

that is not about following some continuous stream of content, but
rather just applying that pathspec to the diff of each commit, and
pruning ones where it did not change. So even if there are gaps where
the file did not exist, we continue to apply the pathspec to the older
commits.

Tools like git-blame will _not_ work, though, as they really are trying
to track the content as they walk back through history. And Once all of
the content seems to appear from nowhere in your new commit, that seems
like a dead end.

In theory there could be some machine-readable annotation in the commit
object (or in a note created after the fact) to say "even though 'foo'
is a new file here, it came from $commit:foo".  And then git-blame could
keep following the content there. But such a feature does not yet exist.

-Peff


Re: Retrieving a file in git that was deleted and committed

2018-12-06 Thread biswaranjan panda
Thanks! Strangely git log --follow does work.
On Thu, Dec 6, 2018 at 10:55 PM Bryan Turner  wrote:
>
> On Thu, Dec 6, 2018 at 10:49 PM biswaranjan panda
>  wrote:
> >
> > I have the following scenario:
> >
> > On a branch A, I deleted a file foo.txt and committed the change. Then
> > I did a bunch of other changes.
> > Now I want to undelete foo.txt.
> >
> > One way is to checkout a separate branch B where the file is present.
> > Then checkout A. Then do
> > git checkout B -- path_to_file
>
> It doesn't change anything, but note that you don't need to checkout B
> first, to restore the file. If you know a commit SHA where the file is
> present, "git checkout SHA -- path_to_file" will pull back the file as
> it existed at that commit.
>
> >
> > While this does gets the file back, the file shows up as a new file to
> > be committed. Once I commit it, git blame doesn't show the old history
> > for the file.
> >
> > I would appreciate if anyone knows how to preserve git blame history
>
> It's not possible, as far as I'm aware. While the new file has the
> same name as the old file, to Git they are two unrelated entries that
> happen to reside at the same path. Even things like "git log --follow"
> won't consider the file to be related to its previous history.
>
> Bryan



-- 
Thanks,
-Biswa


Re: Retrieving a file in git that was deleted and committed

2018-12-06 Thread Bryan Turner
On Thu, Dec 6, 2018 at 10:49 PM biswaranjan panda
 wrote:
>
> I have the following scenario:
>
> On a branch A, I deleted a file foo.txt and committed the change. Then
> I did a bunch of other changes.
> Now I want to undelete foo.txt.
>
> One way is to checkout a separate branch B where the file is present.
> Then checkout A. Then do
> git checkout B -- path_to_file

It doesn't change anything, but note that you don't need to checkout B
first, to restore the file. If you know a commit SHA where the file is
present, "git checkout SHA -- path_to_file" will pull back the file as
it existed at that commit.

>
> While this does gets the file back, the file shows up as a new file to
> be committed. Once I commit it, git blame doesn't show the old history
> for the file.
>
> I would appreciate if anyone knows how to preserve git blame history

It's not possible, as far as I'm aware. While the new file has the
same name as the old file, to Git they are two unrelated entries that
happen to reside at the same path. Even things like "git log --follow"
won't consider the file to be related to its previous history.

Bryan


Retrieving a file in git that was deleted and committed

2018-12-06 Thread biswaranjan panda
I have the following scenario:

On a branch A, I deleted a file foo.txt and committed the change. Then
I did a bunch of other changes.
Now I want to undelete foo.txt.

One way is to checkout a separate branch B where the file is present.
Then checkout A. Then do
git checkout B -- path_to_file

While this does gets the file back, the file shows up as a new file to
be committed. Once I commit it, git blame doesn't show the old history
for the file.

I would appreciate if anyone knows how to preserve git blame history.


Issue with git-gui and font used for widgets

2018-12-06 Thread Eric Work
Hi,

I have set my UI font in the git-gui preferences, but it only affects
the menus and certain widgets.  It does not apply the font to labels
and buttons.  This creates a problem for my HiDPI display because the
fonts are quite small.  I've never programmed in TCL/TK before so I
don't know exactly what is wrong, but looking at the code I see where
it's supposed to apply the font option to the widgets in a foreach
loop.

foreach class {Button Checkbutton Entry Label
Labelframe Listbox Message
Radiobutton Spinbox Text} {
  option add *$class.font font_ui
}

But that doesn't seem to work.  As an experiment I added the -font
parameter (according to the docs) to where the branch label is created
and that worked as expected.

${NS}::label .branch.l1 \
  -text [mc "Current Branch:"] \
  -anchor w \
  -justify left \
  -font font_ui

I don't know what is the expected behavior in terms of setting fonts,
but I would assume that is not expected?  It appears to be using
TkDefaultFont instead.  The default font being small is really a tk
problem, where as setting the widget font is a git-gui problem.  I
created the following bug report against tk to get some feedback on
the small default font issue as well as documented a workaround to
allow per user default fonts.

https://core.tcl-lang.org/tk/tktview/dccd82bdc70dc25bb6709a6c14880a92104dda43

Any suggestions?

-Eric


Re: [wishlist] git submodule update --reset-hard

2018-12-06 Thread Yaroslav Halchenko
Hi Stefan,

Thanks for the dialogue!  Replies are embedded below.

On Thu, 06 Dec 2018, Stefan Beller wrote:
> ...
> > > What if the branch differs from the sha1 recorded in the superproject?

> > git reset --hard  itself is an operation which should be done with some
> > level of competence in doing "the right thing" by calling it.  You
> > can hop branches even in current (without any submodules in question)
> > repository with it and cause as much chaos as you desire.

> Right.

> git reset --hard would the branch (as well as the working tree) to the
> given sha1, which is confusing as submodules get involved.

> The Right Thing as of now is the sha1 as found in the
> superprojects gitlink. But as that can be different from any branch
> in the submodule, we'd rather detach the HEAD to make it
> deterministic.

yeap, makes total sense to be the thing do that by default ;-)

> There was a proposal to "re-attach HEAD" in the submodule, i.e.
> if the branch branch points at the same commit, we don't need
> a detached HEAD, but could go with the branch instead.

if I got the idea right, if we are talking about any branch, it
would also non-deterministic since who knows what left over branch(es)
point to that commit.  Not sure if I would have used that ;)

> > If desired though, a number of prevention mechanisms could be in place (but
> > would require option(s) to overcome) to allow submodule to be reset 
> > --hard'ed
> > only when some conditions met (e.g. only to the commit which is among parent
> > commits path of the current branch).  This way wild hops would be prevented,
> > although you might still end up in some feature branch.  But since "reset
> > --hard" itself doesn't have any safe-guards, I do not really think they 
> > should
> > be implemented here either.

> So are you looking for
> a) "stay on submodule branch (i.e. HEAD still points at $branch), and
> reset --hard" such that the submodule has a clean index and at that $branch 
> or
> b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch 
> is
>set to the gitlink from the superproject, and then a reset --hard
>will have the worktree set to it as well.

yes!

NB "gitlink" -- just now discovered the thing for me.  Thought it would be
called a  subproject  echoing what git diff/log -p shows for submodule commits.

> (a) is what the referenced submodule.repoLike option implements.

sounds like it indeed, thanks for spelling out

> I'd understand the desire for (b) as well, as it is a "real" hard reset on
> the superproject level, without detaching branches.

yeap

> > >   git reset --hard --recurse-submodules HEAD

> > it does indeed some trick(s) but not all seems to be the ones I desire:

> > 1. Seems to migrate submodule's .git directories into the top level
> > .git/modules

> Ah yes, that happens too. This will help once you want to git-rm
> a submodule and checkout states before and after.

yeap ;-) 

> > > undesirable in the sense of still having local changes (that is what
> > > the above reset with `--recurse` would fix) or changed the branch
> > > state? (i.e. is detached but was on a branch before?)

> > right -- I meant the local changes and indeed reset --recurse-submodules
> > indeed seems to recurse nicely.  Then the undesired effect remaining only
> > the detached HEAD

> For that we may want to revive discussions in
> https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/

well, isn't that one requires a branch to be specified in .gitmodules?

> > > >   git submodule update --recursive

> > > > I would end up in the detached HEADs within submodules.

> > > > What I want is to retain current branch they are at (or may be possible
> > > > "were in"? reflog records might have that information)

> > > So something like

> > >   git submodule foreach --recursive git reset --hard

> > > ?

> > not quite  -- this would just kill all local changes within each submodule, 
> > not
> > to reset it to the desired state, which wouldn't be specified in such
> > invocation, and is only known to the repo containing it

> With this answer it sounds like you'd want (b) from above.

yeap

> > > You may be interested in
> > > https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/
> > > which introduces a switch `submodule.repoLike [ = true]`, which
> > > when set would not detach HEAD in submodules.

> > Thanks! looks interesting -- was there more discussion/activity beyond 
> > those 5
> > posts in the thread?

> Unfortunately there was not.

pity

> > This feature might indeed come handy but if I got it right, it is somewhat
> > complimentary to just having submodule update --reset-hard .  E.g.  
> > submodules
> > might be in different branches (if I am not tracking based on branch 
> > names), so
> > I would not want a recursive checkout with -b|-B.  But we would indeed 
> > benefit
> > from such functionality, since this difficulty of managing branches of
> > submodules I 

Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-06 Thread Jeff King
On Fri, Dec 07, 2018 at 12:10:05AM +0100, SZEDER Gábor wrote:

> On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote:
> > Could we just kill them all?
> > 
> > I guess it's a little tricky, because $! is going to give us the pid of
> > each subshell. We actually want to kill the test sub-process. That takes
> > a few contortions, but the output is nice (every sub-job immediately
> > says "ABORTED ...", and the final process does not exit until the whole
> > tree is done):
> 
> It's trickier than that:
> 
>   - Nowadays our test lib translates SIGINT to exit, but not TERM (or
> HUP, in case a user decides to close the terminal window), which
> means that if the test runs any daemons in the background, then
> those won't be killed when the stress test is aborted.
> 
> This is easy enough to address, and I've been meaning to do this
> in an other patch series anyway.

Yeah, trapping on more signals seems reasonable (or just propagating INT
instead of TERM). I'm more worried about this one:

>   - With the (implied) '--verbose-log' the process killed in the
> background subshell's SIGTERM handler is not the actual test
> process, because '--verbose-log' runs the test again in a piped
> subshell to save its output:

Hmm, yeah. The patch I sent earlier already propagates through the
subshell, so this is really just another layer there. I.e., would it be
reasonable when we are using --verbose-log (or --tee) for the parent
process to propagate signals down to child it spawned?

That would fix this case, and it would make things more predictable in
general (e.g., a user who manually runs kill).

It does feel like a lot of boilerplate to be propagating these signals
manually. I think the "right" Unix-y solution here is process groups:
each sub-job should get its own group, and then the top-level runner
script can kill the whole group. We may run into portability issues,
though (setsid is in util-linux, but I don't know if there's an
equivalent elsewhere; a small perl or C helper could do it, but I have
no idea what that would mean on Windows).

>   (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
>echo $? >"$TEST_RESULTS_BASE.exit") | tee -a 
> "$GIT_TEST_TEE_OUTPUT_FILE"
> 
> That 'kill' kills the "outer" shell process.
> And though I get "ABORTED..." immediately and I get back my
> prompt right away, the commands involved in the above construct
> are still running in the background:
> 
>   $ ./t3903-stash.sh --stress=1
>   ^CABORTED  0.0
>   $ ps a |grep t3903
>   1280 pts/17   S  0:00 /bin/sh ./t3903-stash.sh --stress=1
>   1281 pts/17   S  0:00 tee -a 
> <...>/test-results/t3903-stash.stress-0.out
>   1282 pts/17   S  0:00 /bin/sh ./t3903-stash.sh --stress=1
>   4173 pts/17   S+ 0:00 grep t3903
> 
> I see this behavior with all shells I tried.
> I haven't yet started to think it through why this happens :)

I think you get ABORTED because we are just watching for the
parent-process to exit (and reporting its status). The fact that that
the subshell (and the tee) are still running is not important. That all
makes sense to me.

> Not implying '--verbose-log' but redirecting the test script's
> output, like you did in your 'stress' script, seems to work in
> dash, ksh, and ks93, but not in Bash v4.3 or later, where, for
> whatever reason, the subshell get SIGINT before the SIGTERM would
> arrive.
> While we could easily handle SIGINT in the subshell with the same
> signal handler as SIGTERM, it bothers me that something
> fundamental is wrong here.

Yeah, I don't understand the SIGINT behavior you're seeing with bash. I
can't replicate it with bash 4.4.23 (from Debian unstable).

-Peff


Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-06 Thread Jeff King
On Thu, Dec 06, 2018 at 11:56:01PM +0100, SZEDER Gábor wrote:

> > +test_expect_success 'roll those dice' '
> > +   case "$(openssl rand -base64 1)" in
> > +   z*)
> > +   return 1
> > +   esac
> > +'
> 
> Wasteful :)
> 
>   test $(($$ % 42)) -ne 0

Oh, indeed, that is much more clever. :)

-Peff


Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip

2018-12-06 Thread Josh Steadmon
On 2018.11.28 16:27, Stefan Beller wrote:
> This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> with all feedback addressed. As it took some time, I'll send it
> without range-diff, but would ask for full review.
> 
> I plan on resending after the next release as this got delayed quite a bit,
> which is why I also rebased it to master.
> 
> Thanks,
> Stefan

I am not very familiar with most of the submodule code, but for what
it's worth, this entire series looks good to me. I'll note that most of
the commits caused some style complaints, but I'll leave it up to your
judgement as to whether they're valid or not.

Reviewed-by: Josh Steadmon 


Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line

2018-12-06 Thread Junio C Hamano
Jonathan Tan  writes:

>> Jonathan Tan  writes:
>> 
>> > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct 
>> > output_state *os)
>> >}
>> >os->used += readsz;
>> >  
>> > +  if (!os->packfile_started) {
>> > +  os->packfile_started = 1;
>> > +  if (use_protocol_v2)
>> > +  packet_write_fmt(1, "packfile\n");
>> 
>> If we fix this function so that the only byte in the buffer is held
>> back without emitted when os->used == 1 as I alluded to, this may
>> have to be done a bit later, as with such a change, it is no longer
>> guaranteed that send_client_data() will be called after this point.
>
> I'm not sure what you mean about there being no guarantee that
> send_client_data() is not called - in create_pack_file(), there is an
> "if (output_state.used > 0)" line (previously "if (0 <= buffered)") that
> outputs anything remaining.

I was referring to this part of the review on the previous step,
which you may not yet have read.

OK, this corresponds to the "*cp++ = buffered"  in the original just
before xread().

> + os->used = 1;
> + } else {
> + send_client_data(1, os->buffer, os->used);
> + os->used = 0;

I am not sure if the code is correct when os->used happens to be 1
(shouldn't we hold the byte, skip the call to send_client_data(),
and go back to poll() to expect more data?), but this is a faithful
code movement and rewrite of the original.

The point of this logic is to make sure we always hold back some
bytes and do not feed *all* the bytes to the other side by calling
"send-client-data" until we made sure the upstream of what we are
relaying (pack-objects?) successfully exited, but it looks to me
that the "else" clause above ends up flushing everything when
os->used is 1, which goes against the whole purpose of the code.

And the "fix" I was alluding to was to update that "else" clause to
make it a no-op that keeps os->used non-zero, which would not call
send-client-data.

When that fix happens, the part that early in the function this
patch added "now we know we will call send-client-data, so let's say
'here comes packdata' unless we have already said that" is making
the decision too early.  Depending on the value of os->used when we
enter the code and the number of bytes xread() reads from the
upstream, we might not call send-client-data yet (namely, when we
have no buffered data and we happened to get only one byte).

> ... it might be
> better if the server can send sideband throughout the whole response -
> perhaps that should be investigated first.

Yup.  It just looked quite crazy, and it is even more crazy to
buffer keepalives ;-)


Re: Any way to make git-log to enumerate commits?

2018-12-06 Thread Junio C Hamano
Konstantin Khomoutov  writes:

>> I do not see why the "name each rev relative to HEAD" formatting
>> option cannot produce HEAD^2~2 etc.
>>  ...
> My reading was that the OP explicitly wanted to just glance at a single
> integer number and use it right away in a subsequent rebase command.
>
> I mean, use one's own short memory instead of copying and pasting.

As everybody pointed out, a single integer number will fundamentally
unworkable with distributed nature of Git that inherently makes the
history with forks and merges.  Besides, there is no way to feed
"git log" with "a single integer number", without which "making
git-log to enumerate" would not be all that useful ("git show
HEAD~4" works but "git show 4" does not).

All of these name-rev based suggestions were about using anchoring
point with memorable name plus a short path to reach from there,
which I think is the closest thing to "a single integer number" and
still is usable for the exact purpose.  "HEAD~48^2" on an
integration branch would be "the tip of the side branch that was
merged some 48 commits ago", for example.


Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()

2018-12-06 Thread Jonathan Tan
Also CC-ing Stolee since I mention multi-pack indices at the end.

> This seems like a reasonable thing to do, but I have sort of a
> meta-comment. In several places we've started doing this kind of "if
> it's this type of object, do X, otherwise do Y" optimization (e.g.,
> handling large blobs for streaming).
> 
> And in the many cases we end up doubling the effort to do object
> lookups: here we do one lookup to get the type, and then if it's not a
> commit (or if we don't have a commit graph) we end up parsing it anyway.
> 
> I wonder if we could do better. In this instance, it might make sense
> to first see if we actually have a commit graph available (it might not
> have this object, of course, but at least we'd expect it to have most
> commits).

This makes sense - I thought I shouldn't mention the commit graph in the
code since it seems like a layering violation, but I felt the need to
mention commit graph in a comment, so maybe the need to mention commit
graph in the code is there too. Subsequently, maybe the lookup-for-type
could be replaced by a lookup-in-commit-graph (maybe by using
parse_commit_in_graph() directly), which should be at least slightly
faster.

> In general, it would be nice if we had a more incremental API
> for accessing objects: open, get metadata, then read the data. That
> would make these kinds of optimizations "free".

Would this be assuming that to read the data, you would (1) first need to
read the metadata, and (2) there would be no redundancy in reading the
two? It seems to me that for loose objects, you would want to perform
all your reads at once, since any read requires opening the file, and
for commit graphs, you just want to read what you want, since the
metadata and the data are in separate places.

> I don't have numbers for how much the extra lookups cost. The lookups
> are probably dwarfed by parse_object() in general, so even if we save
> only a few full object loads, it may be a win. It just seems a shame
> that we may be making the "slow" paths (when our type-specific check
> doesn't match) even slower.

I agree. I think it will always remain a tradeoff when we have multiple
data sources of objects (loose, packed, commit graph - and we can't
unify them all, since they each have their uses). Unless the multi-pack
index can reference commit graphs as well...then it could be our first
point of reference without introducing any inefficiencies...


Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()

2018-12-06 Thread Jonathan Tan
> > This is on sb/more-repo-in-api because I'm using the repo_parse_commit()
> > function.
> 
> This is a mere nicety, not strictly required.
> Before we had parse_commit(struct commit *) which would accomplish the
> same, (and we'd still have that afterwards as a #define falling back onto
> the_repository). As the function get_reference() is not the_repository safe
> as it contains a call to is_promisor_object() that is repository
> agnostic, I think
> it would be fair game to not depend on that series. I am not
> complaining, though.

Good point - I'll base the next version on master (and add a TODO
explaining which functions are not yet converted).

> AFAICT oid_object_info doesn't take advantage of the commit graph,
> but just looks up the object header, which is still less than completely
> parsing it. Then lookup_commit is overly strict, as it may return
> NULL as when there still is a type mismatch (I don't think a mismatch
> could happen here, as both rely on just the object store, and not the
> commit graph.), so this would be just defensive programming for
> the sake of it. I dunno.
> 
> struct commit *c;
> 
> if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT &&
> (c = lookup_commit(revs->repo, oid)) &&
> !repo_parse_commit(revs->repo, c))
> object = (struct object *) c;
> else
> object = parse_object(revs->repo, oid);

I like this way better - I'll do it in the next version.


Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line

2018-12-06 Thread Jonathan Tan
> Jonathan Tan  writes:
> 
> > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct 
> > output_state *os)
> > }
> > os->used += readsz;
> >  
> > +   if (!os->packfile_started) {
> > +   os->packfile_started = 1;
> > +   if (use_protocol_v2)
> > +   packet_write_fmt(1, "packfile\n");
> 
> If we fix this function so that the only byte in the buffer is held
> back without emitted when os->used == 1 as I alluded to, this may
> have to be done a bit later, as with such a change, it is no longer
> guaranteed that send_client_data() will be called after this point.

I'm not sure what you mean about there being no guarantee that
send_client_data() is not called - in create_pack_file(), there is an
"if (output_state.used > 0)" line (previously "if (0 <= buffered)") that
outputs anything remaining.

> Isn't progress output that goes to the channel #2 pretty much
> independent from the payload stream that carries the pkt-line 
> command like "packfile" plus the raw pack stream?  It somehow
> feels like an oxymoron to _buffer_ progress indicator, as it
> defeats the whole point of progress report to buffer it.

Yes, it is - I don't fully like this part of the design. I alluded to a
similar issue (keepalive) in the toplevel email [1] and that it might be
better if the server can send sideband throughout the whole response -
perhaps that should be investigated first. If we had sideband throughout
the whole response, we wouldn't need to buffer the progress indicator.

[1] https://public-inbox.org/git/cover.1543879256.git.jonathanta...@google.com/


Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc

2018-12-06 Thread Jonathan Tan
> > +This feature allows servers to serve part of their packfile response as 
> > URIs.
> > +This allows server designs that improve scalability in bandwidth and CPU 
> > usage
> > +(for example, by serving some data through a CDN), and (in the future) 
> > provides
> > +some measure of resumability to clients.
> 
> Without reading the remainder, this makes readers anticipate a few
> good things ;-)
> 
>  - "part of", so pre-generated constant material can be given from
>CDN and then followed-up by "filling the gaps" small packfile,
>perhaps?

Yes :-)

>  - The "part of" transmission may not bring the repository up to
>date wrt to the "want" objects; would this feature involve "you
>asked history up to these commits, but with this pack-uri, you'll
>be getting history up to these (somewhat stale) commits"?

It could be, but not necessarily. In my current WIP implementation, for
example, pack URIs don't give you any commits at all (and thus, no
history) - only blobs. Quite a few people first think of the "stale
clone then top-up" case, though - I wonder if it would be a good idea to
give the blob example in this paragraph in order to put people in the
right frame of mind.

> > +If the client replies with the following arguments:
> > +
> > + * packfile-uris
> > + * thin-pack
> > + * ofs-delta
> 
> "with the following" meaning "with all of the following", or "with
> any of the following"?  Is there a reason why the server side must
> require that the client understands and is willing to accept a
> thin-pack when wanting to use packfile-uris?  The same question for
> the ofs-delta.

"All of the following", but from your later comments, we probably don't
need this section anyway.

> My recommendation is to drop the mention of "thin" and "ofs" from
> the above list, and also from the following paragraph.  The "it MAY
> send" will serve as a practical escape clause to allow a server/CDN
> implementation that *ALWAYS* prepares pregenerated material that can
> only be digested by clients that supports thin and ofs.  Such a server
> can send packfile-URIs only when all of the three are given by the
> client and be compliant.  And such an update to the proposed document
> would allow a more diskful server to prepare both thin and non-thin
> pregenerated packs and choose which one to give to the client depending
> on the capability.

That is true - we can just let the server decide. I'll update the patch
accordingly, and state that the URIs should point to packs with features
like thin-pack and ofs-delta only if the client has declared that it
supports them.

> > +Clients then should understand that the returned packfile could be 
> > incomplete,
> > +and that it needs to download all the given URIs before the fetch or clone 
> > is
> > +complete. Each URI should point to a Git packfile (which may be a thin 
> > pack and
> > +which may contain offset deltas).
> 
> weaken or remove the (parenthetical comment) in the last sentence,
> and replace the beginning of the section with something like
> 
>   If the client replies with 'packfile-uris', when the server
>   sends the packfile, it MAY send a `packfile-uris` section...
> 
> You may steal what I wrote in the above response to help the
> server-side folks to decide how to actually implement the "it MAY
> send a packfile-uris" part in the document.

OK, will do.

> OK, this comes back to what I alluded to at the beginning.  We could
> respond to a full-clone request by feeding a series of packfile-uris
> and some ref information, perhaps like this:
> 
>   * Grab this packfile and update your remote-tracking refs
>   and tags to these values; you'd be as if you cloned the
>   project when it was at v1.0.
> 
>   * When you are done with the above, grab this packfile and
>   update your remote-tracking refs and tags to these values;
>   you'd be as if you cloned the project when it was at v2.0.
> 
>   * When you are done with the above, grab this packfile and
>   update your remote-tracking refs and tags to these values;
>   you'd be as if you cloned the project when it was at v3.0.
> 
>   ...
> 
>   * When you are done with the above, here is the remaining
>   packdata to bring you fully up to date with your original
>   "want"s.
> 
> and before fully reading the proposal, I anticipated that it was
> what you were going to describe.  The major difference is "up to the
> packdata given to you so far, you'd be as if you fetched these" ref
> information, which would allow you to be interrupted and then simply
> resume, without having to remember the set of packfile-uris yet to
> be processed across a fetch/clone failure.  If you sucessfully fetch
> packfile for ..v1.0, you can update the remote-tracking refs to
> match as if you fetched back when that was the most recent state of
> the project, and then if you failed while transferring packfile for
> v1.0..v2.0, 

Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-06 Thread SZEDER Gábor
On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote:
> Could we just kill them all?
> 
> I guess it's a little tricky, because $! is going to give us the pid of
> each subshell. We actually want to kill the test sub-process. That takes
> a few contortions, but the output is nice (every sub-job immediately
> says "ABORTED ...", and the final process does not exit until the whole
> tree is done):

It's trickier than that:

  - Nowadays our test lib translates SIGINT to exit, but not TERM (or
HUP, in case a user decides to close the terminal window), which
means that if the test runs any daemons in the background, then
those won't be killed when the stress test is aborted.

This is easy enough to address, and I've been meaning to do this
in an other patch series anyway.

  - With the (implied) '--verbose-log' the process killed in the
background subshell's SIGTERM handler is not the actual test
process, because '--verbose-log' runs the test again in a piped
subshell to save its output:

  (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
   echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE"

That 'kill' kills the "outer" shell process.
And though I get "ABORTED..." immediately and I get back my
prompt right away, the commands involved in the above construct
are still running in the background:

  $ ./t3903-stash.sh --stress=1
  ^CABORTED  0.0
  $ ps a |grep t3903
  1280 pts/17   S  0:00 /bin/sh ./t3903-stash.sh --stress=1
  1281 pts/17   S  0:00 tee -a 
<...>/test-results/t3903-stash.stress-0.out
  1282 pts/17   S  0:00 /bin/sh ./t3903-stash.sh --stress=1
  4173 pts/17   S+ 0:00 grep t3903

I see this behavior with all shells I tried.
I haven't yet started to think it through why this happens :)

Not implying '--verbose-log' but redirecting the test script's
output, like you did in your 'stress' script, seems to work in
dash, ksh, and ks93, but not in Bash v4.3 or later, where, for
whatever reason, the subshell get SIGINT before the SIGTERM would
arrive.
While we could easily handle SIGINT in the subshell with the same
signal handler as SIGTERM, it bothers me that something
fundamental is wrong here.
Furthermore, then we should perhaps forbid '--stress' in
combination with '--verbose-log' or '--tee'.


> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 9b7f687396..357dead3ff 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -98,8 +98,9 @@ done,*)
>   mkdir -p "$(dirname "$TEST_RESULTS_BASE")"
>   stressfail="$TEST_RESULTS_BASE.stress-failed"
>   rm -f "$stressfail"
> - trap 'echo aborted >"$stressfail"' TERM INT HUP
> + trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' 
> TERM INT HUP
>  
> + job_pids=
>   job_nr=0
>   while test $job_nr -lt "$job_count"
>   do
> @@ -108,10 +109,13 @@ done,*)
>   GIT_TEST_STRESS_JOB_NR=$job_nr
>   export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR
>  
> + trap 'kill $test_pid 2>/dev/null' TERM
> +
>   cnt=0
>   while ! test -e "$stressfail"
>   do
> - if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1
> + $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & 
> test_pid=$!
> + if wait
>   then
>   printf >&2 "OK   %2d.%d\n" 
> $GIT_TEST_STRESS_JOB_NR $cnt
>   elif test -f "$stressfail" &&
> @@ -124,16 +128,11 @@ done,*)
>   fi
>   cnt=$(($cnt + 1))
>   done
> - ) &
> + ) & job_pids="$job_pids $!"
>   job_nr=$(($job_nr + 1))
>   done
>  
> - job_nr=0
> - while test $job_nr -lt "$job_count"
> - do
> - wait
> - job_nr=$(($job_nr + 1))
> - done
> + wait
>  
>   if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted"
>   then
> 



Re: [BUG REPORT] Git does not correctly replay bisect log

2018-12-06 Thread Christian Couder
On Thu, Dec 6, 2018 at 6:30 PM Lukáš Krejčí  wrote:
>
> I am talking about `git bisect replay`. The shell script, as far as I
> can see, only updates the references (ref/bisect/*) and never checks if
> the revisions marked as 'good' are ancestors of the 'bad' one.
> Therefore, $GIT_DIR/BISECT_ANCESTORS_OK file is never created.

Indeed `git bisect replay` first only updates the references
(ref/bisect/*) according to all the "git bisect {good,bad}"
instructions it finds in the log it is passed. After doing that
though, before exiting, it calls `bisect_auto_next` which calls
`bisect_next` which calls `git bisect--helper --next-all` which checks
the merge bases.

I think it is a bug.

`git bisect replay` is right to only update the references
(ref/bisect/*) and not to compute and checkout the best commit to test
at each step except at the end, but it should probably just create the
$GIT_DIR/BISECT_ANCESTORS_OK file if more than one bisection step has
been performed (because the merge bases are checked as part of the
first bisection step only).

> The first time the ancestors are checked is in the helper (`git-bisect-
> -help --next-all`) that has only limited information from refs/bisect*.

`git-bisect--helper --next-all` knows how to get refs/bisect*
information, otherwise it couldn't decide which is the next best
commit to test.

Thanks for your help in debugging this,
Christian.


Yo! Dear

2018-12-06 Thread theperfectwater
This is a multi-part message in MIME format.
Hey




n8541112.pdf
Description: Adobe PDF document


Re: git, monorepos, and access control

2018-12-06 Thread Coiner, John
Johannes,

Thanks for your feedback.

I'm not looking closely at submodules, as it's my understanding that 
VFSForGit does not support them. A VFS would be a killer feature for us. 
If VFSForGit were to support submodules, we'd look at them. They would 
provide access control in a way that's clearly nonabusive. I hear you on 
the drawbacks.

AMD today looks more like the 100 independent repos you describe, except 
we don't have automation at the delivery arcs. Integrations are manual 
and thus not particularly frequent.

I'm probably biased toward favoring a monorepo, which I've seen applied 
at a former employer, versus continuous delivery. That's due to lack of 
personal familiarity with CD -- not any real objections.

Thanks,

John

On 12/06/2018 03:08 PM, Johannes Schindelin wrote:
> Hi,
>
> On Wed, 5 Dec 2018, Jeff King wrote:
>
>> The model that fits more naturally with how Git is implemented would be
>> to use submodules. There you leak the hash of the commit from the
>> private submodule, but that's probably obscure enough (and if you're
>> really worried, you can add a random nonce to the commit messages in the
>> submodule to make their hashes unguessable).
> I hear myself frequently saying: "Friends don't let friends use
> submodules". It's almost like: "Some people think their problem is solved
> by using submodules. Only now they have two problems."
>
> There are big reasons, after all, why some companies go for monorepos: it
> is not for lack of trying to go with submodules, it is the problems that
> were incurred by trying to treat entire repositories the same as single
> files (or even trees): they are just too different.
>
> In a previous life, I also tried to go for submodules, was burned, and had
> to restart the whole thing. We ended up with something that might work in
> this instance, too, although our use case was not need-to-know type of
> encapsulation. What we went for was straight up modularization.
>
> What I mean is that we split the project up into over 100 individual
> projects that are now all maintained in individual repositories, and they
> are connected completely outside of Git, via a dependency management
> system (in this case, Maven, although that is probably too Java-centric
> for AMD's needs).
>
> I just wanted to throw that out here: if you can split up your project
> into individual projects, it might make sense not to maintain them as
> submodules but instead as individual repositories whose artifacts are
> uploaded into a central, versioned artifact store (Maven, NuGet, etc). And
> those artifacts would then be retrieved by the projects that need them.
>
> I figure that that scheme might work for you better than submodules: I
> could imagine that you need to make the build artifacts available even to
> people who are not permitted to look at the corresponding source code,
> anyway.
>
> Ciao,
> Johannes



Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

2018-12-06 Thread SZEDER Gábor
On Wed, Dec 05, 2018 at 04:36:26PM -0500, Jeff King wrote:
> The signal interrupts the first wait.

Ah, of course.  I'm ashamed to say that this is not the first time I
forget about that...

> > Bash 4.3 or later are strange: I get back the shell prompt immediately
> > after ctrl-C as well, so it doesn't appear to be waiting for all
> > remaining jobs to finish either, but! I don't get any of the progress
> > output from those jobs to mess up my next command.
> 
> Interesting. My bash 4.4 seems to behave the same as dash. It almost
> sounds like the SIGINT is getting passed to the subshells for you.
> Probably not really worth digging into, though.

The subshell does indeed get SIGINT.  I don't know why that happens,
to my understanding that should not happen.

> In case anybody is playing with this and needed to simulate a stress
> failure, here's what I used:
> 
> diff --git a/t/t-basic.sh b/t/t-basic.sh
> index b6566003dd..a370cd9977 100755
> --- a/t/t-basic.sh
> +++ b/t/t-basic.sh
> @@ -1171,4 +1171,11 @@ test_expect_success 'very long name in the index 
> handled sanely' '
>   test $len = 4098
>  '
>  
> +test_expect_success 'roll those dice' '
> + case "$(openssl rand -base64 1)" in
> + z*)
> + return 1
> + esac
> +'

Wasteful :)

  test $(($$ % 42)) -ne 0



Re: [WIP RFC 1/5] Documentation: order protocol v2 sections

2018-12-06 Thread Jonathan Tan
> > The git command line expects Git servers to follow a specific order of
> 
> "Command line"?  It sounds like you are talking about the order of
> command line arguments and options, but apparently that is not what
> you are doing.  Is it "The git over-the-wire protocol"?

I meant to say the current Git implementation, as opposed to what is
written in the specification. I'll replace it with "The current C Git
implementation".

> Earlier, we said that shallow-info is not given when packfile is not
> there.  That is captured in the updated EBNF above.  We don't have a
> corresponding removal of a bullet point for wanted-refs section below
> but probably that is because the original did not have corresponding
> bullet point to begin with.

That's because the corresponding bullet point had other information.
Quoted in full below:

>   * This section is only included if the client has requested a
> ref using a 'want-ref' line and if a packfile section is also
> included in the response.

I could reword it to "If a packfile section is included in the response,
this section is only included if the client has requested a ref using a
'want-ref' line", but I don't think that is significantly clearer.


behaviour of git-blame -M -C (maybe a bug?)

2018-12-06 Thread dmg




hi everybody,

I am the maintainer of cregit. We are trying to improve blame 
traceability at the token level (see 
https://github.com/dmgerman/papers/blob/master/editorials/cregit/cregit.org)


We use git-blame heavilty in cregit. One of the features that I 
would like to add to cregit is the ability track movement of code.


I have been testing git-blame -M -C and I found some behaviour 
that  seems incorrect. I have created a very simple repository 
that I think showcases this problem:


https://github.com/dmgerman/testBlameMove

this repo have 4 commits (listed below in order of execution):

1. A file is created tpm-dev.c (authored by D German),
2. a refactoring (code is moved from tpm-dev.c to 
tpm-dev-common.c, a new file). Author is "refactor"
3. a commit that adds some few contiguous lines (the existence of 
this commit seems to matter). Author is "none"
4. a commit that changes few lines and alters the result of blame 
for lines not modified by this commit. Author is "problem"


See below. I am running blame at different commits, showing only 
the lines attributed to author "refactor" (author of commit #2).


dmg@iodine:/tmp/testRepo|master ⇒  git log --oneline
ded1aa1 (HEAD -> master, origin/master) problematic commit
3720e68 simple commit
391adba refactoring
33165cb file before refactoring

if we checkout 391adba and do blame -M -C we get this:

dmg@iodine:/tmp/testRepo|3720e68 ⇒  git checkout 3720e68 && git 
blame -M -C tpm-dev-common.c | grep refactor | head

HEAD is now at 3720e68 simple commit
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  24) 
begin_include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  25) 
include|#
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  26) 
directive|include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  27) 
file|"tpm-dev.h"
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  28) 
end_include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 147) 
DECL|function|tpm_common_open (struct file * file,struct tpm_chip 
* chip,struct file_priv * priv)
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 148) 
name|void
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 149) 
name|tpm_common_open
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 150) 
parameter_list|(
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 151) 
name|struct


so far, so good. blame detects the movement. Note that the changes 
by refactor are adding 5 lines (24 to 28) and then adding some at 
147 and beyond.


now do it for the next commit: 3720e68


things continue to look good. The changes of this commit do not 
affect any of these lines.


now... the next commit, the problematic: ded1aa1 (author is not 
refactor)


dmg@iodine:/tmp/testRepo|3720e68 ⇒  git checkout ded1aa1 && git 
blame -M -C tpm-dev-common.c | grep refactor | head

Previous HEAD position was 3720e68 simple commit
HEAD is now at ded1aa1 problematic commit
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  24) 
begin_include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  25) 
include|#
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  26) 
directive|include
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  27) 
file|"tpm-dev.h"
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  28) 
end_include

391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  29)
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  30) 
begin_function
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  53) 
name|user_read_timer
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800  54) 
argument_list|)
391adba4 tpm-dev-common.c (refactor 2018-12-06 12:41:10 -0800 150) 
DECL|function|tpm_common_open (struct file * file,struct tpm_chip 
* chip,struct file_priv * priv)


now blame assigns the lines 29, 30, 53 and 54 to commit 391adba4 
refactor!!! This is what I think is a bug.
(by the way, the changes made in this last commit were between 28 
and 150)


thank you in advance for any clues on why git-blame is behaving 
like this.


--dmg

---
D M German
http://turingmachine.org


How are you today?

2018-12-06 Thread Ms.Amelia Halona



Good Day Dear,

I'm Ms Amelia and am new here looking for a serious relationship with someone i 
can spend the rest of my life with, someone who is loving, caring, honest and 
faithful to spend the rest of my life with, i came across your profile and its 
very interesting to me and i would like to know more about you to see if this 
will work out  for both of us...Kindly reply me to know more about each other 
better. Here is my email address ( mis.ameliahal...@gmail.com )
Love Ms Amelia


Re: enhancement: support for author.email and author.name in "git config"

2018-12-06 Thread Johannes Schindelin
Hi William,

On Thu, 6 Dec 2018, William Hubbs wrote:

> On Thu, Dec 06, 2018 at 07:38:08PM +0100, Martin Ågren wrote:
> > Hi William,
> > 
> > [...]
> > 
> > This idea was floated a couple of months ago [1]. Junio seemed to find
> > the request sensible and outlined a design. No patches materialized, as
> > far as I know, but that could be because Eric suggested a tool called
> > direnv. Maybe that would work for you.
>  
>  Yes, this design would solve the issue.

There *is* a way to get what you want that is super easy and will
definitely work: if you sit down and do it ;-)

Please let us know if you need any additional information before you can
start.

Ciao,
Johannes

Re: git, monorepos, and access control

2018-12-06 Thread Stefan Beller
On Thu, Dec 6, 2018 at 12:09 PM Johannes Schindelin
 wrote:
>
> Hi,
>
> On Wed, 5 Dec 2018, Jeff King wrote:
>
> > The model that fits more naturally with how Git is implemented would be
> > to use submodules. There you leak the hash of the commit from the
> > private submodule, but that's probably obscure enough (and if you're
> > really worried, you can add a random nonce to the commit messages in the
> > submodule to make their hashes unguessable).
>
> I hear myself frequently saying: "Friends don't let friends use
> submodules". It's almost like: "Some people think their problem is solved
> by using submodules. Only now they have two problems."

Blaming tools for their lack of evolution/development is not necessarily the
right approach. I recall having patches rejected on this very mailing list
that fixed obvious but minor good things like whitespaces and coding style,
because it *might* produce merge conflicts. Would that situation warrant me
to blame the lacks in the merge algorithm, or could you imagine a better
way out? (No need to answer, it's purely to demonstrate that blaming
tooling is not always the right approach; only sometimes it may be)

> There are big reasons, after all, why some companies go for monorepos: it
> is not for lack of trying to go with submodules, it is the problems that
> were incurred by trying to treat entire repositories the same as single
> files (or even trees): they are just too different.

We could change that in more places.

One example you might think of is the output of git-status that displays
changed files. And in case of submodules it would just show
"submodule changes", which we already differentiate into "dirty tree" and
"different sha1 at HEAD".
Instead we could have the output of all changed files recursively in the
superprojects git-status output.

Another example is the diff machinery, which already knows some
basics such as embedding submodule logs or actual diffs.

> In a previous life, I also tried to go for submodules, was burned, and had
> to restart the whole thing. We ended up with something that might work in
> this instance, too, although our use case was not need-to-know type of
> encapsulation. What we went for was straight up modularization.

So this is a "Fix the data instead of the tool", which seems to be a local
optimization (i.e. you only have to do it once, such that it is cheaper to
do than fixing the tool for that workgroup)
... and because everyone does that the tool never gets fixed.

> What I mean is that we split the project up into over 100 individual
> projects that are now all maintained in individual repositories, and they
> are connected completely outside of Git, via a dependency management
> system (in this case, Maven, although that is probably too Java-centric
> for AMD's needs).

Once you have the dependency management system in place, you
will encounter the rare case of still wanting to change things across
repository boundaries at the same time. Submodules offer that, which
is why Android wants to migrate off of the repo tool, and there it seems
natural to go for submodules.

> I just wanted to throw that out here: if you can split up your project
> into individual projects, it might make sense not to maintain them as
> submodules but instead as individual repositories whose artifacts are
> uploaded into a central, versioned artifact store (Maven, NuGet, etc). And
> those artifacts would then be retrieved by the projects that need them.

This is cool and industry standard. But once you happen to run in a bug
that involves 2 new artifacts (but each of the new artifacts work fine
on their own), then you'd wish for something like "git-bisect but across
repositories". Submodules (in theory) allow for fine grained bisection
across these repository boundaries, I would think.

> I figure that that scheme might work for you better than submodules: I
> could imagine that you need to make the build artifacts available even to
> people who are not permitted to look at the corresponding source code,
> anyway.

This is a sensible suggestion, as they probably don't want to ramp up
development on submodules. :-)


Re: [RFC] cherry-pick notes to find out cherry-picks from the origin

2018-12-06 Thread Tejun Heo
Hello, Jeff.

So, this is what I currently have.  It still does the same thing but a
lot more generic in terms of both interface and implementation.

* All core logics are implemented as core helpers / features.

  * Trailer parsing and reverse-mapping in trailer_rev_xrefs_*().

  * Note refs which start with xref- (cross-reference) are recognized
by notes core.  When notes are added, a dedicated combine_notes
function is used to remove duplicates and curl unreachable
commits.  When xref- notes are formatted for printing, it
automatically follows and prints nested xrefs.

* note-cherry-picks is replaced with reverse-trailer-xrefs which can
  use other trailers, note refs and tags.  --xref-cherry-picks option
  makes it use the cherry-pick presets.

Please note that the patch is still a bit rough.  I'm polishing and
documenting.  Please let me know what you think.

Thanks.

---
 Makefile|1 
 builtin.h   |1 
 builtin/reverse-trailer-xrefs.c |  148 
 git.c   |1 
 notes.c |  245 +++-
 notes.h |   10 +
 object.c|4 
 object.h|6 
 trailer.c   |  102 
 trailer.h   |   26 
 10 files changed, 540 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 1a44c811a..3c23ecf9d 100644
--- a/Makefile
+++ b/Makefile
@@ -1086,6 +1086,7 @@ BUILTIN_OBJS += builtin/multi-pack-index.o
 BUILTIN_OBJS += builtin/mv.o
 BUILTIN_OBJS += builtin/name-rev.o
 BUILTIN_OBJS += builtin/notes.o
+BUILTIN_OBJS += builtin/reverse-trailer-xrefs.o
 BUILTIN_OBJS += builtin/pack-objects.o
 BUILTIN_OBJS += builtin/pack-redundant.o
 BUILTIN_OBJS += builtin/pack-refs.o
diff --git a/builtin.h b/builtin.h
index 6538932e9..51089e258 100644
--- a/builtin.h
+++ b/builtin.h
@@ -195,6 +195,7 @@ extern int cmd_multi_pack_index(int argc, const char 
**argv, const char *prefix)
 extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_notes(int argc, const char **argv, const char *prefix);
+extern int cmd_reverse_trailer_xrefs(int argc, const char **argv, const char 
*prefix);
 extern int cmd_pack_objects(int argc, const char **argv, const char *prefix);
 extern int cmd_pack_redundant(int argc, const char **argv, const char *prefix);
 extern int cmd_patch_id(int argc, const char **argv, const char *prefix);
diff --git a/builtin/reverse-trailer-xrefs.c b/builtin/reverse-trailer-xrefs.c
new file mode 100644
index 0..b2879be6c
--- /dev/null
+++ b/builtin/reverse-trailer-xrefs.c
@@ -0,0 +1,148 @@
+#include "builtin.h"
+#include "cache.h"
+#include "strbuf.h"
+#include "repository.h"
+#include "config.h"
+#include "commit.h"
+#include "blob.h"
+#include "notes.h"
+#include "notes-utils.h"
+#include "trailer.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "object-store.h"
+#include "parse-options.h"
+
+static const char * const reverse_trailer_xrefs_usage[] = {
+   N_("git reverse_trailer_xrefs [] [...]"),
+   NULL
+};
+
+static const char cherry_picked_prefix[] = "(cherry picked from commit ";
+static int verbose;
+
+static void clear_trailer_xref_note(struct commit *commit, void *data)
+{
+   struct notes_tree *tree = data;
+   int status;
+
+   status = remove_note(tree, commit->object.oid.hash);
+
+   if (verbose) {
+   if (status)
+   fprintf(stderr, "Object %s has no note\n",
+   oid_to_hex(>object.oid));
+   else
+   fprintf(stderr, "Removing note for object %s\n",
+   oid_to_hex(>object.oid));
+   }
+}
+
+static void record_trailer_xrefs(struct commit *commit, void *data)
+{
+   trailer_rev_xrefs_record(data, commit);
+}
+
+static int note_trailer_xrefs(struct notes_tree *tree,
+ struct commit *from_commit, struct object_array 
*to_objs,
+ const char *tag)
+{
+   char from_hex[GIT_MAX_HEXSZ + 1];
+   struct strbuf note = STRBUF_INIT;
+   struct object_id note_oid;
+   int i, ret;
+
+   oid_to_hex_r(from_hex, _commit->object.oid);
+
+   for (i = 0; i < to_objs->nr; i++) {
+   const char *hex = to_objs->objects[i].name;
+
+   if (tag)
+   strbuf_addf(, "%s: %s\n", tag, hex);
+   else
+   strbuf_addf(, "%s\n", tag);
+   if (verbose)
+   fprintf(stderr, "Adding note %s -> %s\n", from_hex, 
hex);
+   }
+
+   ret = write_object_file(note.buf, note.len, blob_type, _oid);
+   strbuf_release();
+   if (ret)
+   return ret;
+
+   ret = add_note(tree, 

Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip

2018-12-06 Thread Stefan Beller
On Tue, Dec 4, 2018 at 7:10 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> > with all feedback addressed. As it took some time, I'll send it
> > without range-diff, but would ask for full review.
>
> Is that a "resend" or reroll/update (or whatever word that does not
> imply "just sending the same thing again")?

As you noticed, it is an actual update. I started to use resend
as DScho seems very unhappy about the word reroll claiming we'd
be the only Software community that uses the term reroll for
an iteration of a change.

I see how resend could sound like retransmission without change.


> child_process_init(cp);
>  -  cp->dir = strbuf_detach(_path, NULL);
> -   prepare_submodule_repo_env(>env_array);
>  +  cp->dir = xstrdup(repo->worktree);
> +   prepare_submodule_repo_env(>env_array);
>
> Hmph, I offhand do not see there would be any difference if you
> assigned to cp->dir before or after preparing the repo env, but is
> there a reason these two must be done in this updated order that I
> am missing?  Very similar changes appear multiple times in this
> range-diff.

Jonathan Tan asked for it to be "diff friendly". This -of course- is
range-diff unfriendly.

> [...]

you seem to be OK with a lot of the changes, I did not find an
actionable suggestion.

Thanks for still queuing topics during -rc time,
Stefan


Re: [wishlist] git submodule update --reset-hard

2018-12-06 Thread Stefan Beller
On Thu, Dec 6, 2018 at 1:25 PM Yaroslav Halchenko  wrote:
>
>
> On Thu, 06 Dec 2018, Stefan Beller wrote:
>
> > On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko  
> > wrote:
>
> > > Dear Git Gurus,
>
> > > I wondered what would be your take on my wishlist request to add
> > > --reset-hard option, which would be very similar to regular "update" which
> > > checks out necessary commit, but I want it to remain in the branch.
>
> > What if the branch differs from the sha1 recorded in the superproject?
>
> git reset --hard  itself is an operation which should be done with some
> level of competence in doing "the right thing" by calling it.  You
> can hop branches even in current (without any submodules in question)
> repository with it and cause as much chaos as you desire.

Right.

git reset --hard would the branch (as well as the working tree) to the
given sha1, which is confusing as submodules get involved.

The Right Thing as of now is the sha1 as found in the
superprojects gitlink. But as that can be different from any branch
in the submodule, we'd rather detach the HEAD to make it
deterministic.

There was a proposal to "re-attach HEAD" in the submodule, i.e.
if the branch branch points at the same commit, we don't need
a detached HEAD, but could go with the branch instead.

> If desired though, a number of prevention mechanisms could be in place (but
> would require option(s) to overcome) to allow submodule to be reset --hard'ed
> only when some conditions met (e.g. only to the commit which is among parent
> commits path of the current branch).  This way wild hops would be prevented,
> although you might still end up in some feature branch.  But since "reset
> --hard" itself doesn't have any safe-guards, I do not really think they should
> be implemented here either.

So are you looking for
a) "stay on submodule branch (i.e. HEAD still points at $branch), and
reset --hard"
such that the submodule has a clean index and at that $branch or
b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch is
   set to the gitlink from the superproject, and then a reset --hard
will have the worktree
   set to it as well.

(a) is what the referenced submodule.repoLike option implements.

I'd understand the desire for (b) as well, as it is a "real" hard reset on
the superproject level, without detaching branches.

> >   git reset --hard --recurse-submodules HEAD

> it does indeed some trick(s) but not all seems to be the ones I desire:
>
> 1. Seems to migrate submodule's .git directories into the top level
> .git/modules

Ah yes, that happens too. This will help once you want to git-rm
a submodule and checkout states before and after.

> > undesirable in the sense of still having local changes (that is what
> > the above reset with `--recurse` would fix) or changed the branch
> > state? (i.e. is detached but was on a branch before?)
>
> right -- I meant the local changes and indeed reset --recurse-submodules
> indeed seems to recurse nicely.  Then the undesired effect remaining only
> the detached HEAD

For that we may want to revive discussions in
https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/


> > >   git submodule update --recursive
>
> > > I would end up in the detached HEADs within submodules.
>
> > > What I want is to retain current branch they are at (or may be possible
> > > "were in"? reflog records might have that information)
>
> > So something like
>
> >   git submodule foreach --recursive git reset --hard
>
> > ?
>
> not quite  -- this would just kill all local changes within each submodule, 
> not
> to reset it to the desired state, which wouldn't be specified in such
> invocation, and is only known to the repo containing it

With this answer it sounds like you'd want (b) from above.

> > You may be interested in
> > https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/
> > which introduces a switch `submodule.repoLike [ = true]`, which
> > when set would not detach HEAD in submodules.
>
> Thanks! looks interesting -- was there more discussion/activity beyond those 5
> posts in the thread?

Unfortunately there was not.

> This feature might indeed come handy but if I got it right, it is somewhat
> complimentary to just having submodule update --reset-hard .  E.g.  submodules
> might be in different branches (if I am not tracking based on branch names), 
> so
> I would not want a recursive checkout with -b|-B.  But we would indeed benefit
> from such functionality, since this difficulty of managing branches of
> submodules I think would be elevated with it! (e.g. in one use case we 
> probably
> will end up with a few thousands of submodules, and at least 3 branches in 
> each
> which would need to be in sync, and typically you wouldn't want different
> branches to be checked out in different submodules)
>
> > Can you say more about the first question above:
> > Would you typically have situations where the
> > submodule branch is out 

[PATCH] terminology tweak: prune -> path limiting

2018-12-06 Thread Matthew DeVore
In the codebase, "prune" is a highly overloaded term, and it caused me a
lot of trouble to figure out what it meant when it was used in the
context of path limiting. Stop using the word "prune" when we really
mean "path limiting."

Signed-off-by: Matthew DeVore 
---
 Documentation/technical/api-history-graph.txt |  5 +-
 builtin/add.c |  4 +-
 builtin/diff.c|  8 +-
 builtin/fast-export.c |  2 +-
 builtin/log.c |  2 +-
 builtin/rev-list.c|  2 +-
 diff-lib.c|  6 +-
 revision.c| 90 ++-
 revision.h|  4 +-
 t/t7811-grep-open.sh  |  2 +-
 tree-walk.c   | 10 +--
 wt-status.c   |  4 +-
 12 files changed, 72 insertions(+), 67 deletions(-)

diff --git a/Documentation/technical/api-history-graph.txt 
b/Documentation/technical/api-history-graph.txt
index d0d1707c8c..f9a100f88c 100644
--- a/Documentation/technical/api-history-graph.txt
+++ b/Documentation/technical/api-history-graph.txt
@@ -100,8 +100,9 @@ Limitations
   on all parents of that commit.  Parents must not be skipped, or the graph
   output will appear incorrect.
 +
-`graph_update()` may be used on a pruned set of commits only if the parent list
-has been rewritten so as to include only ancestors from the pruned set.
+`graph_update()` may be used on a pruned (e.g. path-limited) set of commits 
only
+if the parent list has been rewritten so as to include only ancestors from the
+pruned set.
 
 * The graph API does not currently support reverse commit ordering.  In
   order to implement reverse ordering, the graphing API needs an
diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..4abd8ebba8 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -113,14 +113,14 @@ int add_files_to_cache(const char *prefix,
repo_init_revisions(the_repository, , prefix);
setup_revisions(0, NULL, , NULL);
if (pathspec)
-   copy_pathspec(_data, pathspec);
+   copy_pathspec(_limits, pathspec);
rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
rev.diffopt.format_callback = update_callback;
rev.diffopt.format_callback_data = 
rev.diffopt.flags.override_submodule_config = 1;
rev.max_count = 0; /* do not compare unmerged paths with stage #2 */
run_diff_files(, DIFF_RACY_IS_MODIFIED);
-   clear_pathspec(_data);
+   clear_pathspec(_limits);
return !!data.add_errors;
 }
 
diff --git a/builtin/diff.c b/builtin/diff.c
index f0393bba23..9010b3228a 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -74,8 +74,8 @@ static int builtin_diff_b_f(struct rev_info *revs,
if (argc > 1)
usage(builtin_diff_usage);
 
-   GUARD_PATHSPEC(>prune_data, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
-   path = revs->prune_data.items[0].match;
+   GUARD_PATHSPEC(>path_limits, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
+   path = revs->path_limits.items[0].match;
 
if (lstat(path, ))
die_errno(_("failed to stat '%s'"), path);
@@ -421,8 +421,8 @@ int cmd_diff(int argc, const char **argv, const char 
*prefix)
die(_("unhandled object '%s' given."), name);
}
}
-   if (rev.prune_data.nr)
-   paths += rev.prune_data.nr;
+   if (rev.path_limits.nr)
+   paths += rev.path_limits.nr;
 
/*
 * Now, do the arguments look reasonable?
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 9e283482ef..6a675b5737 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1143,7 +1143,7 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
import_marks(import_filename);
lastimportid = last_idnum;
 
-   if (import_filename && revs.prune_data.nr)
+   if (import_filename && revs.path_limits.nr)
full_tree = 1;
 
get_tags_and_duplicates();
diff --git a/builtin/log.c b/builtin/log.c
index 45aa376a59..f8554d7fa1 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -690,7 +690,7 @@ static void log_setup_revisions_tweak(struct rev_info *rev,
  struct setup_revision_opt *opt)
 {
if (rev->diffopt.flags.default_follow_renames &&
-   rev->prune_data.nr == 1)
+   rev->path_limits.nr == 1)
rev->diffopt.flags.follow_renames = 1;
 
/* Turn --cc/-c into -p --cc/-c when -p was not given */
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3a2c0c23b6..edb0799533 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -517,7 +517,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
if (show_progress)
progress 

[PATCH] fetch: ensure submodule objects fetched

2018-12-06 Thread Stefan Beller
Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C " (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

But this default fetch is not sufficient, as a newly fetched commit in
the superproject could point to a commit in the submodule that is not
in the default refspec. This is common in workflows like Gerrit's.
When fetching a Gerrit change under review (from refs/changes/??), the
commits in that change likely point to submodule commits that have not
been merged to a branch yet.

Fetch a submodule object by id if the object that the superproject
points to, cannot be found. For now this object is fetched from the
'origin' remote as we defer getting the default remote to a later patch.

A list of new submodule commits are already generated in certain
conditions (by check_for_new_submodule_commits()); this new feature
invokes that function in more situations.

The submodule checks were done only when a ref in the superproject
changed, these checks were extended to also be performed when fetching
into FETCH_HEAD for completeness, and add a test for that too.

Signed-off-by: Stefan Beller 
---

Thanks Jonathan for the review!
So it looks like only the last patch needs some improvements,
which is why I'd only resend the last patch here.
Also note the test with interious superproject commits.

All suggestions sounded sensible, addressing them all,
here is a range-diff to the currently queued version:

Range-diff:
1:  04eb06607b ! 1:  ac6558cbc9 fetch: try fetching submodules if needed 
objects were not fetched
@@ -1,6 +1,6 @@
 Author: Stefan Beller 
 
-fetch: try fetching submodules if needed objects were not fetched
+fetch: ensure submodule objects fetched
 
 Currently when git-fetch is asked to recurse into submodules, it 
dispatches
 a plain "git-fetch -C " (with some submodule related 
options
@@ -14,22 +14,19 @@
 commits in that change likely point to submodule commits that have not
 been merged to a branch yet.
 
-Try fetching a submodule by object id if the object id that the
-superproject points to, cannot be found.
+Fetch a submodule object by id if the object that the superproject
+points to, cannot be found. For now this object is fetched from the
+'origin' remote as we defer getting the default remote to a later 
patch.
 
-builtin/fetch used to only inspect submodules when they were fetched
-"on-demand", as in either on/off case it was clear whether the 
submodule
-needs to be fetched. However to know whether we need to try fetching 
the
-object ids, we need to identify the object names, which is done in this
-function check_for_new_submodule_commits(), so we'll also run that code
-in case the submodule recursion is set to "on".
+A list of new submodule commits are already generated in certain
+conditions (by check_for_new_submodule_commits()); this new feature
+invokes that function in more situations.
 
 The submodule checks were done only when a ref in the superproject
 changed, these checks were extended to also be performed when fetching
 into FETCH_HEAD for completeness, and add a test for that too.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
  diff --git a/builtin/fetch.c b/builtin/fetch.c
  --- a/builtin/fetch.c
@@ -82,7 +79,7 @@
  
struct string_list changed_submodule_names;
 +
-+  /* The submodules to fetch in */
++  /* Pending fetches by OIDs */
 +  struct fetch_task **oid_fetch_tasks;
 +  int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
  };
@@ -97,13 +94,16 @@
return spf->default_option;
  }
  
++/*
++ * Fetch in progress (if callback data) or
++ * pending (if in oid_fetch_tasks in struct submodule_parallel_fetch)
++ */
 +struct fetch_task {
 +  struct repository *repo;
 +  const struct submodule *sub;
 +  unsigned free_sub : 1; /* Do we need to free the submodule? */
 +
-+  /* fetch specific oids if set, otherwise fetch default refspec */
-+  struct oid_array *commits;
++  struct oid_array *commits; /* Ensure these commits are fetched */
 +};
 +
 +/**
@@ -176,7 +176,6 @@
  
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
 -  struct strbuf submodule_prefix = STRBUF_INIT;
-+  int recurse_config;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
const char *default_argv;
 -  const struct submodule *submodule;
@@ -199,11 +198,9 @@
 +  task = fetch_task_create(spf->r, ce->name);
 +  if (!task)
 +  continue;

Re: [wishlist] git submodule update --reset-hard

2018-12-06 Thread Yaroslav Halchenko


On Thu, 06 Dec 2018, Stefan Beller wrote:

> On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko  
> wrote:

> > Dear Git Gurus,

> > I wondered what would be your take on my wishlist request to add
> > --reset-hard option, which would be very similar to regular "update" which
> > checks out necessary commit, but I want it to remain in the branch.

> What if the branch differs from the sha1 recorded in the superproject?

git reset --hard  itself is an operation which should be done with some
level of competence in doing "the right thing" by calling it.  You
can hop branches even in current (without any submodules in question)
repository with it and cause as much chaos as you desire.

If desired though, a number of prevention mechanisms could be in place (but
would require option(s) to overcome) to allow submodule to be reset --hard'ed
only when some conditions met (e.g. only to the commit which is among parent
commits path of the current branch).  This way wild hops would be prevented,
although you might still end up in some feature branch.  But since "reset
--hard" itself doesn't have any safe-guards, I do not really think they should
be implemented here either.

> > Rationale: In DataLad we heavily rely on submodules, and we have established
> > easy ways to do some manipulations across full hierarchies of them. E.g. a
> > single command could introduce a good number of commits across deep 
> > hierarchy
> > of submodules, e.g. while committing changes within deep submodule, while 
> > also
> > doing all necessary commits in the repositories leading to that submodule so
> > the entire tree of them stays in a "clean" state. The difficulty comes when
> > there is a need to just "forget" some changes.  The obvious way is to e.g.

> >git reset --hard PREVIOUS_STATE

>   git reset --hard --recurse-submodules HEAD

> would do the trick

it does indeed some trick(s) but not all seems to be the ones I desire:

1. Seems to migrate submodule's .git directories into the top level
.git/modules

$>  git reset --hard --recurse-submodules HEAD^^^
Migrating git directory of 'famface' from
'/tmp/gobbini/famface/.git' to
'/tmp/gobbini/.git/modules/famface'
Migrating git directory of 'famface/data' from
'/tmp/gobbini/famface/data/.git' to
'/tmp/gobbini/.git/modules/famface/modules/data'
Migrating git directory of 'famface/data/scripts/mridefacer' from
'/tmp/gobbini/famface/data/scripts/mridefacer/.git' to

'/tmp/gobbini/.git/modules/famface/modules/data/modules/scripts/mridefacer'
HEAD is now at 9b4296d [DATALAD] aggregated meta data

we might eventually adopt this default already for years model (git annex seems
to be ok, in that it then replaces .git symlink file with the actual
symlink .git -> ../../.git/modules/...  So things seems to keep working
for annex)

2. It still does the detached HEAD for me

$> git submodule status --recursive  
 2569ab436501a832d35afbbe9cc20ffeb6077eb1 famface (2569ab4)
 f1e8c9b8b025c311424283b9711efc6bc906ba2b famface/data (BIDS-v1.0.1)
 49b0fe42696724c2a8492f999736056e51b77358 
famface/data/scripts/mridefacer (49b0fe4)


> > in the top level repository.  But that leaves all the submodules now in
> > the undesired state.  If I do

> undesirable in the sense of still having local changes (that is what
> the above reset with `--recurse` would fix) or changed the branch
> state? (i.e. is detached but was on a branch before?)

right -- I meant the local changes and indeed reset --recurse-submodules
indeed seems to recurse nicely.  Then the undesired effect remaining only
the detached HEAD

> >   git submodule update --recursive

> > I would end up in the detached HEADs within submodules.

> > What I want is to retain current branch they are at (or may be possible
> > "were in"? reflog records might have that information)

> So something like

>   git submodule foreach --recursive git reset --hard

> ?

not quite  -- this would just kill all local changes within each submodule, not
to reset it to the desired state, which wouldn't be specified in such
invocation, and is only known to the repo containing it

> You may be interested in
> https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/
> which introduces a switch `submodule.repoLike [ = true]`, which
> when set would not detach HEAD in submodules.

Thanks! looks interesting -- was there more discussion/activity beyond those 5
posts in the thread?
https://public-inbox.org/git/87h8i9ift4@evledraar.gmail.com/#r 

This feature might indeed come handy but if I got it right, it is somewhat
complimentary to just having submodule update --reset-hard .  E.g.  submodules
might be in different branches (if I am not tracking based on branch names), so
I would not want a recursive checkout with -b|-B.  But we would indeed benefit
from such functionality, since this difficulty of managing 

[PATCH v2 2/3] commit-graph: fix buffer read-overflow

2018-12-06 Thread Josh Steadmon
fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.

Signed-off-by: Josh Steadmon 
---
 commit-graph.c  | 14 --
 t/t5318-commit-graph.sh | 28 
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 07dd410f3c..224a5f161e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
last_chunk_offset = 8;
chunk_lookup = data + 8;
for (i = 0; i < graph->num_chunks; i++) {
-   uint32_t chunk_id = get_be32(chunk_lookup + 0);
-   uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+   uint32_t chunk_id;
+   uint64_t chunk_offset;
int chunk_repeated = 0;
 
+   if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH >
+   data + graph_size) {
+   error(_("chunk lookup table entry missing; graph file 
may be incomplete"));
+   free(graph);
+   return NULL;
+   }
+
+   chunk_id = get_be32(chunk_lookup + 0);
+   chunk_offset = get_be64(chunk_lookup + 4);
+
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5fe21db99f..2503cb0345 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -384,6 +384,29 @@ corrupt_graph_and_verify() {
test_i18ngrep "$grepstr" err
 }
 
+
+# usage: corrupt_and_zero_graph_then_verify   
 
+# Manipulates the commit-graph file at  by inserting the 
data,
+# then zeros the file starting at . Finally, runs
+# 'git commit-graph verify' and places the output in the file 'err'. Tests 
'err'
+# for the given string.
+corrupt_and_zero_graph_then_verify() {
+   corrupt_pos=$1
+   data="${2:-\0}"
+   zero_pos=$3
+   grepstr=$4
+   orig_size=$(stat --format=%s $objdir/info/commit-graph)
+   cd "$TRASH_DIRECTORY/full" &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   printf "$data" | dd of="$objdir/info/commit-graph" bs=1 
seek="$corrupt_pos" conv=notrunc &&
+   truncate --size=$zero_pos $objdir/info/commit-graph &&
+   truncate --size=$orig_size $objdir/info/commit-graph &&
+   test_must_fail git commit-graph verify 2>test_err &&
+   grep -v "^+" test_err >err &&
+   test_i18ngrep "$grepstr" err
+}
+
 test_expect_success 'detect bad signature' '
corrupt_graph_and_verify 0 "\0" \
"graph signature"
@@ -484,6 +507,11 @@ test_expect_success 'detect invalid checksum hash' '
"incorrect checksum"
 '
 
+test_expect_success 'detect truncated graph' '
+   corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \
+   $GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing"
+'
+
 test_expect_success 'git fsck (checks commit-graph)' '
cd "$TRASH_DIRECTORY/full" &&
git fsck &&
-- 
2.20.0.rc2.10.g7519fc76df



[PATCH v2 3/3] Makefile: correct example fuzz build

2018-12-06 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6b72f37c29..bbcfc2bc9f 100644
--- a/Makefile
+++ b/Makefile
@@ -3104,7 +3104,7 @@ cover_db_html: cover_db
 # An example command to build against libFuzzer from LLVM 4.0.0:
 #
 # make CC=clang CXX=clang++ \
-#  FUZZ_CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
+#  CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
 #  LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \
 #  fuzz-all
 #
-- 
2.20.0.rc2.10.g7519fc76df



[PATCH v2 1/3] commit-graph, fuzz: Add fuzzer for commit-graph

2018-12-06 Thread Josh Steadmon
Breaks load_commit_graph_one() into a new function,
parse_commit_graph(). The latter function operates on arbitrary buffers,
which makes it suitable as a fuzzing target. Since parse_commit_graph()
is only called by load_commit_graph_one() (and the fuzzer described
below), we omit error messages that would be duplicated by the caller.

Adds fuzz-commit-graph.c, which provides a fuzzing entry point
compatible with libFuzzer (and possibly other fuzzing engines).

Signed-off-by: Josh Steadmon 
---
 .gitignore  |  1 +
 Makefile|  1 +
 commit-graph.c  | 53 ++---
 commit-graph.h  |  3 +++
 fuzz-commit-graph.c | 16 ++
 5 files changed, 57 insertions(+), 17 deletions(-)
 create mode 100644 fuzz-commit-graph.c

diff --git a/.gitignore b/.gitignore
index 0d77ea5894..8bcf153ed9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+/fuzz-commit-graph
 /fuzz_corpora
 /fuzz-pack-headers
 /fuzz-pack-idx
diff --git a/Makefile b/Makefile
index 1a44c811aa..6b72f37c29 100644
--- a/Makefile
+++ b/Makefile
@@ -684,6 +684,7 @@ SCRIPTS = $(SCRIPT_SH_INS) \
 
 ETAGS_TARGET = TAGS
 
+FUZZ_OBJS += fuzz-commit-graph.o
 FUZZ_OBJS += fuzz-pack-headers.o
 FUZZ_OBJS += fuzz-pack-idx.o
 
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..07dd410f3c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -84,16 +84,10 @@ static int commit_graph_compatible(struct repository *r)
 struct commit_graph *load_commit_graph_one(const char *graph_file)
 {
void *graph_map;
-   const unsigned char *data, *chunk_lookup;
size_t graph_size;
struct stat st;
-   uint32_t i;
-   struct commit_graph *graph;
+   struct commit_graph *ret;
int fd = git_open(graph_file);
-   uint64_t last_chunk_offset;
-   uint32_t last_chunk_id;
-   uint32_t graph_signature;
-   unsigned char graph_version, hash_version;
 
if (fd < 0)
return NULL;
@@ -108,27 +102,55 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
die(_("graph file %s is too small"), graph_file);
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   ret = parse_commit_graph(graph_map, fd, graph_size);
+
+   if (!ret) {
+   munmap(graph_map, graph_size);
+   close(fd);
+   exit(1);
+   }
+
+   return ret;
+}
+
+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
+   size_t graph_size)
+{
+   const unsigned char *data, *chunk_lookup;
+   uint32_t i;
+   struct commit_graph *graph;
+   uint64_t last_chunk_offset;
+   uint32_t last_chunk_id;
+   uint32_t graph_signature;
+   unsigned char graph_version, hash_version;
+
+   if (!graph_map)
+   return NULL;
+
+   if (graph_size < GRAPH_MIN_SIZE)
+   return NULL;
+
data = (const unsigned char *)graph_map;
 
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
error(_("graph signature %X does not match signature %X"),
  graph_signature, GRAPH_SIGNATURE);
-   goto cleanup_fail;
+   return NULL;
}
 
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
error(_("graph version %X does not match version %X"),
  graph_version, GRAPH_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
hash_version = *(unsigned char*)(data + 5);
if (hash_version != GRAPH_OID_VERSION) {
error(_("hash version %X does not match version %X"),
  hash_version, GRAPH_OID_VERSION);
-   goto cleanup_fail;
+   return NULL;
}
 
graph = alloc_commit_graph();
@@ -152,7 +174,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
-   goto cleanup_fail;
+   free(graph);
+   return NULL;
}
 
switch (chunk_id) {
@@ -187,7 +210,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
if (chunk_repeated) {
error(_("chunk id %08x appears multiple times"), 
chunk_id);
-   goto cleanup_fail;
+   free(graph);
+   return NULL;
}
 
if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
@@ -201,11 +225,6 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
return graph;
-
-cleanup_fail:
-   

[PATCH v2 0/3] Add commit-graph fuzzer and fix buffer overflow

2018-12-06 Thread Josh Steadmon
Add a new fuzz test for the commit graph and fix a buffer read-overflow
that it discovered. Additionally, fix the Makefile instructions for
building fuzzers.

Changes since V1:
  * Moved the parse_commit_graph() declaration to the header file, since
we don't mind if others use it.
  * Moved some unnecessary comments into commit messages.
  * Fixed some style issues.
  * Added a test case for detecting commit graphs with missing chunk
lookup entries.
  * Ævar's comments on the Makefile made me realize the fuzzer build
instructions were using the wrong variable. Added a new commit to
fix this.

Josh Steadmon (3):
  commit-graph, fuzz: Add fuzzer for commit-graph
  commit-graph: fix buffer read-overflow
  Makefile: correct example fuzz build

 .gitignore  |  1 +
 Makefile|  3 +-
 commit-graph.c  | 67 +
 commit-graph.h  |  3 ++
 fuzz-commit-graph.c | 16 ++
 t/t5318-commit-graph.sh | 28 +
 6 files changed, 98 insertions(+), 20 deletions(-)
 create mode 100644 fuzz-commit-graph.c

Range-diff against v1:
1:  53e62baaa8 ! 1:  0b57ecbe1b commit-graph, fuzz: Add fuzzer for commit-graph
@@ -4,7 +4,9 @@
 
 Breaks load_commit_graph_one() into a new function,
 parse_commit_graph(). The latter function operates on arbitrary 
buffers,
-which makes it suitable as a fuzzing target.
+which makes it suitable as a fuzzing target. Since parse_commit_graph()
+is only called by load_commit_graph_one() (and the fuzzer described
+below), we omit error messages that would be duplicated by the caller.
 
 Adds fuzz-commit-graph.c, which provides a fuzzing entry point
 compatible with libFuzzer (and possibly other fuzzing engines).
@@ -35,17 +37,6 @@
  diff --git a/commit-graph.c b/commit-graph.c
  --- a/commit-graph.c
  +++ b/commit-graph.c
-@@
- #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
- 
-+struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-+  size_t graph_size);
-+
-+
- char *get_commit_graph_filename(const char *obj_dir)
- {
-   return xstrfmt("%s/info/commit-graph", obj_dir);
 @@
  struct commit_graph *load_commit_graph_one(const char *graph_file)
  {
@@ -70,7 +61,7 @@
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
 +  ret = parse_commit_graph(graph_map, fd, graph_size);
 +
-+  if (ret == NULL) {
++  if (!ret) {
 +  munmap(graph_map, graph_size);
 +  close(fd);
 +  exit(1);
@@ -79,10 +70,6 @@
 +  return ret;
 +}
 +
-+/*
-+ * This function is intended to be used only from load_commit_graph_one() 
or in
-+ * fuzz tests.
-+ */
 +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 +  size_t graph_size)
 +{
@@ -94,11 +81,9 @@
 +  uint32_t graph_signature;
 +  unsigned char graph_version, hash_version;
 +
-+  /*
-+   * This should already be checked in load_commit_graph_one, but we still
-+   * need a check here for when we're calling parse_commit_graph directly
-+   * from fuzz tests. We can omit the error message in that case.
-+   */
++  if (!graph_map)
++  return NULL;
++
 +  if (graph_size < GRAPH_MIN_SIZE)
 +  return NULL;
 +
@@ -162,12 +147,25 @@
  
  static void prepare_commit_graph_one(struct repository *r, const char 
*obj_dir)
 
+ diff --git a/commit-graph.h b/commit-graph.h
+ --- a/commit-graph.h
+ +++ b/commit-graph.h
+@@
+ 
+ struct commit_graph *load_commit_graph_one(const char *graph_file);
+ 
++struct commit_graph *parse_commit_graph(void *graph_map, int fd,
++  size_t graph_size);
++
+ /*
+  * Return 1 if and only if the repository has a commit-graph
+  * file and generation numbers are computed in that file.
+
  diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
  new file mode 100644
  --- /dev/null
  +++ b/fuzz-commit-graph.c
 @@
-+#include "object-store.h"
 +#include "commit-graph.h"
 +
 +struct commit_graph *parse_commit_graph(void *graph_map, int fd,
@@ -179,9 +177,8 @@
 +{
 +  struct commit_graph *g;
 +
-+  g = parse_commit_graph((void *) data, -1, size);
-+  if (g)
-+  free(g);
++  g = parse_commit_graph((void *)data, -1, size);
++  free(g);
 +
 +  return 0;
 +}
2:  ad2e761f44 ! 2:  af45c2337f commit-graph: fix buffer read-overflow
@@ -22,7 +22,8 @@
 +  uint64_t chunk_offset;
int chunk_repeated = 0;
  
-+  if 

Re: git, monorepos, and access control

2018-12-06 Thread Johannes Schindelin
Hi,

On Wed, 5 Dec 2018, Jeff King wrote:

> The model that fits more naturally with how Git is implemented would be
> to use submodules. There you leak the hash of the commit from the
> private submodule, but that's probably obscure enough (and if you're
> really worried, you can add a random nonce to the commit messages in the
> submodule to make their hashes unguessable).

I hear myself frequently saying: "Friends don't let friends use
submodules". It's almost like: "Some people think their problem is solved
by using submodules. Only now they have two problems."

There are big reasons, after all, why some companies go for monorepos: it
is not for lack of trying to go with submodules, it is the problems that
were incurred by trying to treat entire repositories the same as single
files (or even trees): they are just too different.

In a previous life, I also tried to go for submodules, was burned, and had
to restart the whole thing. We ended up with something that might work in
this instance, too, although our use case was not need-to-know type of
encapsulation. What we went for was straight up modularization.

What I mean is that we split the project up into over 100 individual
projects that are now all maintained in individual repositories, and they
are connected completely outside of Git, via a dependency management
system (in this case, Maven, although that is probably too Java-centric
for AMD's needs).

I just wanted to throw that out here: if you can split up your project
into individual projects, it might make sense not to maintain them as
submodules but instead as individual repositories whose artifacts are
uploaded into a central, versioned artifact store (Maven, NuGet, etc). And
those artifacts would then be retrieved by the projects that need them.

I figure that that scheme might work for you better than submodules: I
could imagine that you need to make the build artifacts available even to
people who are not permitted to look at the corresponding source code,
anyway.

Ciao,
Johannes


Re: enhancement: support for author.email and author.name in "git config"

2018-12-06 Thread William Hubbs
On Thu, Dec 06, 2018 at 07:38:08PM +0100, Martin Ågren wrote:
> Hi William,
> 
> [...]
> 
> This idea was floated a couple of months ago [1]. Junio seemed to find
> the request sensible and outlined a design. No patches materialized, as
> far as I know, but that could be because Eric suggested a tool called
> direnv. Maybe that would work for you.
 
 Yes, this design would solve the issue.

 I'll take a look at direnv, but it would be nice to have native git
 support for this. :-)

Thanks,

William

> [1] 
> https://public-inbox.org/git/0f66ad7a-2289-2cce-6533-a27e19945...@rasmusvillemoes.dk/
> 
> Martin


Re: BUG: CR marker ^M doesn't show up in '-' lines of diffs when the ending of the removed line is CR+LF

2018-12-06 Thread Frank Schäfer


Am 06.12.18 um 01:58 schrieb Junio C Hamano:
> Frank Schäfer  writes:
>
>> Just to be sure that I'm not missing anything here:
>> What's your definition of "LF in repository, CRLF in working tree" in
>> terms of config parameters ?
> :::Documentation/config/core.txt:::
>
> core.autocrlf::
>   Setting this variable to "true" is the same as setting
>   the `text` attribute to "auto" on all files and core.eol to "crlf".
>   Set to true if you want to have `CRLF` line endings in your
>   working directory and the repository has LF line endings.
>   This variable can be set to 'input',
>   in which case no output conversion is performed.
Argh... crap. I was missing that I have to set the "text" attribute
manually...
Thats why core.eol=crlf didn't make a difference in my tests. :/

Let me thoroughly re-validate all cases.
I will likely not find the time for that before Monday, but I think that
break could be helpful. ;)

Regards,
Frank



Re: [RFC PATCH] Introduce "precious" file concept

2018-12-06 Thread Duy Nguyen
On Tue, Nov 27, 2018 at 1:56 PM Jacob Keller  wrote:
> Personally, I would rather err on the side which requires the least
> interaction from users to avoid silently clobbering an ignored file.
>
> Either Duy's solution with a sort of "untracked" reflog, or the
> garbage/trashable notion.

The "untracked reflog" is partially functional now [1] if you want to
have a look. I'm not going to post the series until post-2.20, but if
you do look, I suggest the first patch that lays out the design and a
plumbing command to manage it. Basically you'll do

git backup-log --id=worktree log 

then pick up the version you like with "git backup-log cat" and do
whatever you want with it. High level UI is not there and will be a
topic of discussion.

The precious/trashable/garbage notion can be used to suppress making backups.

[1] https://gitlab.com/pclouds/git/commits/backup-log
-- 
Duy


Re: enhancement: support for author.email and author.name in "git config"

2018-12-06 Thread Martin Ågren
Hi William,

On Thu, 6 Dec 2018 at 19:18, William Hubbs  wrote:
> We are in a situation where we would like to use author information that is
> different from committer information when we commit to certain
> repositories.

[...]

> [...] I would like to propose the addition of author.email and
> author.name settings to the git-config system.
>
> Additionally you could add committer.name and committer.email, but the
> only reason I bring the committer variations up is consistency since I
> see you also have GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment
> variables.

This idea was floated a couple of months ago [1]. Junio seemed to find
the request sensible and outlined a design. No patches materialized, as
far as I know, but that could be because Eric suggested a tool called
direnv. Maybe that would work for you.

[1] 
https://public-inbox.org/git/0f66ad7a-2289-2cce-6533-a27e19945...@rasmusvillemoes.dk/

Martin


Re: [wishlist] git submodule update --reset-hard

2018-12-06 Thread Stefan Beller
On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko  wrote:
>
> Dear Git Gurus,
>
> I wondered what would be your take on my wishlist request to add
> --reset-hard option, which would be very similar to regular "update" which
> checks out necessary commit, but I want it to remain in the branch.

What if the branch differs from the sha1 recorded in the superproject?

> Rationale: In DataLad we heavily rely on submodules, and we have established
> easy ways to do some manipulations across full hierarchies of them. E.g. a
> single command could introduce a good number of commits across deep hierarchy
> of submodules, e.g. while committing changes within deep submodule, while also
> doing all necessary commits in the repositories leading to that submodule so
> the entire tree of them stays in a "clean" state. The difficulty comes when
> there is a need to just "forget" some changes.  The obvious way is to e.g.
>
>git reset --hard PREVIOUS_STATE

  git reset --hard --recurse-submodules HEAD

would do the trick

> in the top level repository.  But that leaves all the submodules now in
> the undesired state.  If I do

undesirable in the sense of still having local changes (that is what
the above reset with `--recurse` would fix) or changed the branch
state? (i.e. is detached but was on a branch before?)

>   git submodule update --recursive
>
> I would end up in the detached HEADs within submodules.
>
> What I want is to retain current branch they are at (or may be possible
> "were in"? reflog records might have that information)

So something like

  git submodule foreach --recursive git reset --hard

?

You may be interested in
https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/
which introduces a switch `submodule.repoLike [ = true]`, which
when set would not detach HEAD in submodules.

Can you say more about the first question above:
Would you typically have situations where the
submodule branch is out of sync with the superproject
and how do you deal with that?

Adding another mode to `git submodule update` sounds
reasonable to me, too.

Stefan


enhancement: support for author.email and author.name in "git config"

2018-12-06 Thread William Hubbs
Hi all,

We are in a situation where we would like to use author information that is
different from committer information when we commit to certain
repositories.

Currently, it looks like there are two ways to do this, and I'm not sure
how to make either of them work well.

There are the GIT_AUTHOR_EMAIL and GIT_AUTHOR_NAME environment
variables, but  these would have to be set globally. Also, there is the
--author command line switch for the "git commit" command, but this is
easy to forget to use.

Is there something I'm missing?

If not, I would like to propose the addition of author.email and
author.name settings to the git-config system.

Additionally you could add committer.name and committer.email, but the
only reason I bring the committer variations up is consistency since I
see you also have GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment
variables.

Does anyone have any thoughts on this?

I don't think either of us is on the mailing list, so please keep us in
CC's when you reply.

Thanks,

William


Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others

2018-12-06 Thread Stefan Beller
On Thu, Dec 6, 2018 at 6:58 AM Phillip Wood  wrote:

> > So is there some "must be at least two consecutive lines" condition for
> > not-plain, or is something else going on here?
>
> To be considered a block has to have 20 alphanumeric characters - see
> commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars",
> 2017-08-15). This stops things like random '}' lines being marked as
> moved on their own.

This is spot on.

All but the "plain" mode use the concept of "blocks" of code
(there is even one mode called "blocks", which adds to the confusion).

> It might be better to use some kind of frequency
> information (a bit like python's difflib junk parameter) instead so that
> (fairly) unique short lines also get marked properly.

Yes that is what I was initially thinking about. However to have good
information, you'd need to index a whole lot (the whole repository,
i.e. all text blobs in existence?) to get an accurate picture of frequency
information, which I'd prefer to call entropy as I come from a background
familiar with https://en.wikipedia.org/wiki/Information_theory, I am not
sure where 'frequency information' comes from -- it sounds like the
same concept.

Of course it is too expensive to run an operation O(repository size)
just for this diff, so maybe we could get away with some smaller
corpus to build up this information on what is sufficient for coloring.

When only looking at the given diff, I would imagine that each line
would not carry a whole lot of information as its characters occur
rather frequently compared to the rest of the diff.

Best,
Stefan


[wishlist] git submodule update --reset-hard

2018-12-06 Thread Yaroslav Halchenko
Dear Git Gurus,

I wondered what would be your take on my wishlist request to add
--reset-hard option, which would be very similar to regular "update" which
checks out necessary commit, but I want it to remain in the branch.

Rationale: In DataLad we heavily rely on submodules, and we have established
easy ways to do some manipulations across full hierarchies of them. E.g. a
single command could introduce a good number of commits across deep hierarchy
of submodules, e.g. while committing changes within deep submodule, while also
doing all necessary commits in the repositories leading to that submodule so
the entire tree of them stays in a "clean" state. The difficulty comes when
there is a need to just "forget" some changes.  The obvious way is to e.g. 

   git reset --hard PREVIOUS_STATE

in the top level repository.  But that leaves all the submodules now in
the undesired state.  If I do

  git submodule update --recursive

I would end up in the detached HEADs within submodules.  

What I want is to retain current branch they are at (or may be possible
"were in"? reflog records might have that information)

Example:

# Have to use datalad install  since  git clone --recurse-submodules
# seems to not consider alternative locations for submodules' .git/
# with url being just a relative path, and where submodules aren't 
# all residing up under toplevel URL .git/

$> datalad install -r http://datasets.datalad.org/labs/gobbini/.git
[INFO   ] Cloning http://datasets.datalad.org/labs/gobbini/.git into 
'/tmp/gobbini' 
install(ok): /tmp/gobbini (dataset) 

[INFO   ] Installing  recursively 
[INFO   ] Cloning http://datasets.datalad.org/labs/gobbini/famface/.git 
into '/tmp/gobbini/famface' 
[INFO   ] Cloning 
http://datasets.datalad.org/labs/gobbini/famface/data/.git into 
'/tmp/gobbini/famface/data'   
[INFO   ] access to dataset sibling "datasets.datalad.org" not 
auto-enabled, enable with:   
|   datalad siblings -d "/tmp/gobbini/famface/data" enable 
-s datasets.datalad.org 
[INFO   ] Cloning 
http://datasets.datalad.org/labs/gobbini/famface/data/scripts/mridefacer/.git 
[2 other candidates] into '/tmp/gobbini/famface/data/scripts/mridefacer' 
action summary: 

  install (ok: 4)

so I have a hierarchy in a good state and all checked out in master
branch

$> cd gobbini

$> git submodule status --recursive   
 b9071a6bc9f7665f7c75549c63d29f16d40e8af7 famface (heads/master)
 e59ba76b42f219bdf14b6b547dd6d9cc0ed5227f famface/data 
(BIDS-v1.0.1-3-ge59ba76b)
 5d8036c0aaeebb448a00df6296ddc9f799efdd1f 
famface/data/scripts/mridefacer (heads/master)

$> git submodule foreach --recursive cat .git/HEAD 
Entering 'famface'
ref: refs/heads/master
Entering 'famface/data'
ref: refs/heads/master
Entering 'famface/data/scripts/mridefacer'
ref: refs/heads/master


and if I do roll back

$> git reset --hard HEAD^^^
HEAD is now at 9b4296d [DATALAD] aggregated meta data
changes on filesystem:  

 famface | 2 +-

and default update --recursive

$> git submodule update --recursive
Submodule path 'famface': checked out 
'2569ab436501a832d35afbbe9cc20ffeb6077eb1'
Submodule path 'famface/data': checked out 
'f1e8c9b8b025c311424283b9711efc6bc906ba2b'
Submodule path 'famface/data/scripts/mridefacer': checked out 
'49b0fe42696724c2a8492f999736056e51b77358'

I end up in detached HEADs

$> git submodule status --recursive 
 2569ab436501a832d35afbbe9cc20ffeb6077eb1 famface (2569ab4)
 f1e8c9b8b025c311424283b9711efc6bc906ba2b famface/data (BIDS-v1.0.1)
 49b0fe42696724c2a8492f999736056e51b77358 
famface/data/scripts/mridefacer (49b0fe4)


I do see that there is a "custom command" way to do it via
"submodule..update" config setting, but that is not easy to use for my
case since all the `` would be different to specify !git reset --hard for
all of them via config option and I could not find any way to "glob" config
(like submodule.*.update).  But in effect that is probably what I need:

# restarting from a clean state here
$> git -c submodule.famface.update='!git reset --hard' submodule update 
--recursive
HEAD is now at 2569ab4 [DATALAD] aggregated meta data
Submodule path 'famface': 'git reset --hard 
2569ab436501a832d35afbbe9cc20ffeb6077eb1'
Submodule path 'famface/data': checked out 
'f1e8c9b8b025c311424283b9711efc6bc906ba2b'
Submodule path 'famface/data/scripts/mridefacer': checked out 

Re: [BUG REPORT] Git does not correctly replay bisect log

2018-12-06 Thread Lukáš Krejčí
On Thu, 2018-12-06 at 17:31 +0100, Christian Couder wrote:
> > When Git replays the bisect log, it only updates refs/bisect/bad,
> > refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in
> > .git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies
> > that all good commits are ancestors of the bad commit, and if not, the
> > message "Bisecting: a merge base must be tested" is printed and the
> > branch is switched to the merge base of the bad and all the good
> > commits.
> 
> I am not sure if you are talking about running `git bisect replay` or
> sourcing the log in the above.

I am talking about `git bisect replay`. The shell script, as far as I
can see, only updates the references (ref/bisect/*) and never checks if
the revisions marked as 'good' are ancestors of the 'bad' one.
Therefore, $GIT_DIR/BISECT_ANCESTORS_OK file is never created.

The first time the ancestors are checked is in the helper (`git-bisect-
-help --next-all`) that has only limited information from refs/bisect*.



Re: [BUG REPORT] Git does not correctly replay bisect log

2018-12-06 Thread Christian Couder
Hi,

On Thu, Dec 6, 2018 at 3:43 PM Lukáš Krejčí  wrote:
>
> Hello again,
>
> after looking into this today, I'm not sure if this can be considered a
> bug - it's just that I expected Git to check out the exact commit to
> test that was there before resetting the bisect. That made me uncertain
> whether Git restored the correct state.
>
> When I looked at what Git actually does, it became clear that the
> behavior is not incorrect but perhaps a bit surprising.

Yeah, I agree. I suspect, but I am not sure, that the difference of
behavior is because in one case we only check merge bases once at the
beginning (maybe because the BISECT_ANCESTORS_OK file always exists)
while in the other case we check them more than once during the
bisection. I haven't had time to look closely at this, but I would
like to.

> When Git replays the bisect log, it only updates refs/bisect/bad,
> refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in
> .git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies
> that all good commits are ancestors of the bad commit, and if not, the
> message "Bisecting: a merge base must be tested" is printed and the
> branch is switched to the merge base of the bad and all the good
> commits.

I am not sure if you are talking about running `git bisect replay` or
sourcing the log in the above.

> Basically, some state is lost because Git "forgot" the first good
> commit from the log already was an ancestor of the first bad one.

The BISECT_ANCESTORS_OK file should be there to avoid forgetting that
we already checked the merge bases.

> In other words, it's as if I just started the bisect with the following
> commands and just pasted the whole bisect log to .git/BISECT_LOG:
>
> $ git bisect start
> $ git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
> $ git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d
> $ git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0
> Bisecting: a merge base must be tested
> [1e4b044d22517cae7047c99038abb23243ca] Linux 4.18-rc4

Yeah, when we start a new bisection the BISECT_ANCESTORS_OK file
should be erased if it exists, while it shouldn't be erased when we
are already in the middle of an existing bisection.

[...]

> And indeed, marking the merge base as good switches to the correct
> commit after the bisect. Marking it as bad will fail, so at least you
> can't make a mistake after replaying the bisect log:
> $ git bisect bad
> The merge base 1e4b044d22517cae7047c99038abb23243ca is bad.
> This means the bug has been fixed between 
> 1e4b044d22517cae7047c99038abb23243ca and 
> [94710cac0ef4ee177a63b5227664b38c95bbf703 
> 958f338e96f874a0d29442396d6adf9c1e17aa2d].

Yeah, I think this works as expected.

> Once again, I'm sorry for the noise. I guess it wasn't clear from the
> man page that something like this could happen and that made me think
> that this was a bug.

No reason to be sorry, there might still be a bug related to the
BISECT_ANCESTORS_OK file or something. I hope I can take a look at
this more closely soon.

Thanks for your report and your work on this,
Christian.


[PATCH] Indent code with TABs

2018-12-06 Thread Nguyễn Thái Ngọc Duy
We indent with TABs and sometimes for fine alignment, TABs followed by
spaces, but never all spaces (unless the indentation is less than 8
columns). Indenting with spaces slips through in some places. Fix
them.

Imported code and compat/ are left alone on purpose. The former should
remain as close as upstream as possible. The latter pretty much has
separate maintainers, it's up to them to decide.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Should be quite safe to merge since "git diff -b" is empty

 archive-tar.c |   2 +-
 archive.c |   4 +-
 builtin/add.c |   2 +-
 builtin/gc.c  |   2 +-
 cache-tree.c  |   2 +-
 convert.c |   6 +--
 git-compat-util.h |   2 +-
 parse-options.c   |   2 +-
 parse-options.h   |   6 +--
 quote.c   |   2 +-
 read-cache.c  | 118 +++---
 revision.c|   4 +-
 symlinks.c|   2 +-
 13 files changed, 77 insertions(+), 77 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index a58e1a8ebf..4aabd566fb 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -142,7 +142,7 @@ static int stream_blocked(const struct object_id *oid)
  * string and appends it to a struct strbuf.
  */
 static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
- const char *value, unsigned int valuelen)
+const char *value, unsigned int valuelen)
 {
int len, tmp;
 
diff --git a/archive.c b/archive.c
index fd556c28e4..c2fe16ad33 100644
--- a/archive.c
+++ b/archive.c
@@ -36,8 +36,8 @@ void init_archivers(void)
 }
 
 static void format_subst(const struct commit *commit,
- const char *src, size_t len,
- struct strbuf *buf)
+const char *src, size_t len,
+struct strbuf *buf)
 {
char *to_free = NULL;
struct strbuf fmt = STRBUF_INIT;
diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..12247b48fd 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -176,7 +176,7 @@ static void refresh(int verbose, const struct pathspec 
*pathspec)
die(_("pathspec '%s' did not match any files"),
pathspec->items[i].match);
}
-free(seen);
+   free(seen);
 }
 
 int run_add_interactive(const char *revision, const char *patch_mode,
diff --git a/builtin/gc.c b/builtin/gc.c
index 871a56f1c5..7696017cd4 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -317,7 +317,7 @@ static void add_repack_all_option(struct string_list 
*keep_pack)
 
 static void add_repack_incremental_option(void)
 {
-   argv_array_push(, "--no-write-bitmap-index");
+   argv_array_push(, "--no-write-bitmap-index");
 }
 
 static int need_to_gc(void)
diff --git a/cache-tree.c b/cache-tree.c
index 9d454d24bc..d6dbbebfb2 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -448,7 +448,7 @@ int cache_tree_update(struct index_state *istate, int flags)
 }
 
 static void write_one(struct strbuf *buffer, struct cache_tree *it,
-  const char *path, int pathlen)
+ const char *path, int pathlen)
 {
int i;
 
diff --git a/convert.c b/convert.c
index e0848226d2..5f60c11ce0 100644
--- a/convert.c
+++ b/convert.c
@@ -705,7 +705,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 }
 
 static int apply_single_file_filter(const char *path, const char *src, size_t 
len, int fd,
-struct strbuf *dst, const char *cmd)
+   struct strbuf *dst, const char *cmd)
 {
/*
 * Create a pipeline to have the command filter the buffer's
@@ -1091,7 +1091,7 @@ static int count_ident(const char *cp, unsigned long size)
 }
 
 static int ident_to_git(const char *path, const char *src, size_t len,
-struct strbuf *buf, int ident)
+   struct strbuf *buf, int ident)
 {
char *dst, *dollar;
 
@@ -1135,7 +1135,7 @@ static int ident_to_git(const char *path, const char 
*src, size_t len,
 }
 
 static int ident_to_worktree(const char *path, const char *src, size_t len,
- struct strbuf *buf, int ident)
+struct strbuf *buf, int ident)
 {
struct object_id oid;
char *to_free = NULL, *dollar, *spc;
diff --git a/git-compat-util.h b/git-compat-util.h
index 09b0102cae..f281aa5185 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -721,7 +721,7 @@ extern const char *githstrerror(int herror);
 #ifdef NO_MEMMEM
 #define memmem gitmemmem
 void *gitmemmem(const void *haystack, size_t haystacklen,
-const void *needle, size_t needlelen);
+   const void *needle, size_t needlelen);
 #endif
 
 #ifdef OVERRIDE_STRDUP
diff --git a/parse-options.c b/parse-options.c
index 3b874a83a0..27353c8e8d 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -236,7 +236,7 @@ static int 

Re: gitweb: local configuration not found

2018-12-06 Thread Martin Mareš
Hello!

> Yeah, it does look indirect.  Despite what you said, it also would
> support users giving an absolute path via GITWEB_CONFIG.
> 
> With "use File::Spec", perhaps something like this?

Yes, this looks right.

Martin


Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others

2018-12-06 Thread Phillip Wood

Hi Ævar

On 06/12/2018 13:54, Ævar Arnfjörð Bjarmason wrote:

Let's ignore how bad this patch is for git.git, and just focus on how
diff.colorMoved treats it:

 diff --git a/builtin/add.c b/builtin/add.c
 index f65c172299..d1155322ef 100644
 --- a/builtin/add.c
 +++ b/builtin/add.c
 @@ -6,5 +6,3 @@
  #include "cache.h"
 -#include "config.h"
  #include "builtin.h"
 -#include "lockfile.h"
  #include "dir.h"
 diff --git a/builtin/am.c b/builtin/am.c
 index 8f27f3375b..eded15aa8a 100644
 --- a/builtin/am.c
 +++ b/builtin/am.c
 @@ -6,3 +6,2 @@
  #include "cache.h"
 -#include "config.h"
  #include "builtin.h"
 diff --git a/builtin/blame.c b/builtin/blame.c
 index 06a7163ffe..44a754f190 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -8,3 +8,2 @@
  #include "cache.h"
 -#include "config.h"
  #include "color.h"
 diff --git a/cache.h b/cache.h
 index ca36b44ee0..ea8d60b94a 100644
 --- a/cache.h
 +++ b/cache.h
 @@ -4,2 +4,4 @@
  #include "git-compat-util.h"
 +#include "config.h"
 +#include "new.h"
  #include "strbuf.h"

This is a common thing that's useful to have highlighted, e.g. we move
includes of config.h to some common file, so I want to se all the
deleted config.h lines as moved into the cache.h line, and then the
"lockfile.h" I removed while I was at it plain remove, and the new
"new.h" plain added.

Exactly that is what you get with diff.colorMoved=plain, but the default
of diff.colorMoved=zebra gets confused by this and highlights no moves
at all, same or "blocks" and "dimmed-zebra".

So at first I thought this had something to do with the many->one
detection, but it seems to be simpler, we just don't detect a move of
1-line with anything but plain, e.g. this works as expected in all modes
and detects the many->one:

 diff --git a/builtin/add.c b/builtin/add.c
 index f65c172299..f4fda75890 100644
 --- a/builtin/add.c
 +++ b/builtin/add.c
 @@ -5,4 +5,2 @@
   */
 -#include "cache.h"
 -#include "config.h"
  #include "builtin.h"
 diff --git a/builtin/branch.c b/builtin/branch.c
 index 0c55f7f065..52e39924d3 100644
 --- a/builtin/branch.c
 +++ b/builtin/branch.c
 @@ -7,4 +7,2 @@

 -#include "cache.h"
 -#include "config.h"
  #include "color.h"
 diff --git a/cache.h b/cache.h
 index ca36b44ee0..d4146dbf8a 100644
 --- a/cache.h
 +++ b/cache.h
 @@ -3,2 +3,4 @@

 +#include "cache.h"
 +#include "config.h"
  #include "git-compat-util.h"

So is there some "must be at least two consecutive lines" condition for
not-plain, or is something else going on here?


To be considered a block has to have 20 alphanumeric characters - see 
commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars", 
2017-08-15). This stops things like random '}' lines being marked as 
moved on their own. It might be better to use some kind of frequency 
information (a bit like python's difflib junk parameter) instead so that 
(fairly) unique short lines also get marked properly.


Best Wishes

Phillip


Re: [BUG REPORT] Git does not correctly replay bisect log

2018-12-06 Thread Lukáš Krejčí
Hello again,

after looking into this today, I'm not sure if this can be considered a
bug - it's just that I expected Git to check out the exact commit to
test that was there before resetting the bisect. That made me uncertain
whether Git restored the correct state.

When I looked at what Git actually does, it became clear that the
behavior is not incorrect but perhaps a bit surprising.

When Git replays the bisect log, it only updates refs/bisect/bad,
refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in
.git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies
that all good commits are ancestors of the bad commit, and if not, the
message "Bisecting: a merge base must be tested" is printed and the
branch is switched to the merge base of the bad and all the good
commits.

Basically, some state is lost because Git "forgot" the first good
commit from the log already was an ancestor of the first bad one.
In other words, it's as if I just started the bisect with the following
commands and just pasted the whole bisect log to .git/BISECT_LOG:

$ git bisect start
$ git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
$ git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d
$ git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0
Bisecting: a merge base must be tested
[1e4b044d22517cae7047c99038abb23243ca] Linux 4.18-rc4

(here's the full bisect log again for reference)
git bisect start
# bad: [5b394b2ddf0347bef56e50c69a58773c94343ff3] Linux 4.19-rc1
git bisect bad 5b394b2ddf0347bef56e50c69a58773c94343ff3
# good: [94710cac0ef4ee177a63b5227664b38c95bbf703] Linux 4.18
git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703
# bad: [54dbe75bbf1e189982516de179147208e90b5e45] Merge tag 
'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm
git bisect bad 54dbe75bbf1e189982516de179147208e90b5e45
# bad: [0a957467c5fd46142bc9c52758ffc552d4c5e2f7] x86: i8259: Add missing 
include file
git bisect bad 0a957467c5fd46142bc9c52758ffc552d4c5e2f7
# good: [958f338e96f874a0d29442396d6adf9c1e17aa2d] Merge branch 'l1tf-final' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d
# bad: [2c20443ec221dcb76484b30933593e8ecd836bbd] Merge tag 'acpi-4.19-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 2c20443ec221dcb76484b30933593e8ecd836bbd
# bad: [c2fc71c9b74c1e87336a27dba1a5edc69d2690f1] Merge tag 'mtd/for-4.19' of 
git://git.infradead.org/linux-mtd
git bisect bad c2fc71c9b74c1e87336a27dba1a5edc69d2690f1
# bad: [b86d865cb1cae1e61527ea0b8977078bbf694328] blkcg: Make 
blkg_root_lookup() work for queues in bypass mode
git bisect bad b86d865cb1cae1e61527ea0b8977078bbf694328
# bad: [1b0d274523df5ef1caedc834da055ff721e4d4f0] nvmet: don't use uuid_le type
git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0

And indeed, marking the merge base as good switches to the correct
commit after the bisect. Marking it as bad will fail, so at least you
can't make a mistake after replaying the bisect log:
$ git bisect bad
The merge base 1e4b044d22517cae7047c99038abb23243ca is bad.
This means the bug has been fixed between 
1e4b044d22517cae7047c99038abb23243ca and 
[94710cac0ef4ee177a63b5227664b38c95bbf703 
958f338e96f874a0d29442396d6adf9c1e17aa2d].

Once again, I'm sorry for the noise. I guess it wasn't clear from the
man page that something like this could happen and that made me think
that this was a bug.



How to investigate further a seemingly 'ghost' commit (after a merge)?

2018-12-06 Thread fan2linux
Hello, 

I am trying to understand how a fix from a bug-correction branch vanished and 
the bug found its way back into the main branch after two merges. 

I am using git version 2.19.2. 

Checkouting tag 18.40.1 and checking its graph: 

> $ git checkout 18.40.1 
> 
> $ git log --oneline --graph 
> * ee4721a1 (HEAD, tag: 18.40.1) Merge branch 'suppressionNumAdhCTP' into 
> 'homol2' 
> |\ 
> | * 52aeaf64 Retire le numéro de contrat et le numéro d'adhérent qui étaient 
> affichés en haut de la page et empiétaient sur le préimprimé. 
> |/ 
> * 9f68ec4b (tag: 18.40.0) Merge branch 'cherry-pick-7c956b5f' into 'homol2' 

For the record, here is the git log excerpt on the file in question: 

> $ git log app/CTPCB-AC001/CTPCB-AC001.tex 
> commit 52aeaf64c808c1e3ee8c5cbf5f0221d4e8a7699d 
> Author: Guillaume Ballin < guillaume.bal...@mnt.fr > 
> Date: Tue Nov 13 11:52:03 2018 +0100 
> 
> Retire le numéro de contrat et le numéro d'adhérent qui étaient affichés en 
> haut de la page et empiétaient sur le préimprimé. 
> 
> commit 9336db5c25bb0f3af19183fe1db2fb05ce28b9f3 
> Author: arnavander < arnavan...@mnt.fr > 
> Date: Mon Nov 5 14:46:27 2018 + 
> 
> Correction retour anomalie de Carte TP 
> 
> Signed-off-by: arnavander < arnavan...@mnt.fr > 
> 
> 
> (cherry picked from commit 7c956b5f29bf23c624684c9300a13abecfb451c5) 

Checkouting commit 25ca67a9 and checking its graph: 

> $ git checkout 25ca67a9 
> 
> $ git log --oneline --graph 
> * 25ca67a9 (HEAD) Merge branch 'homol2' into 'homol' 
> |\ 
> | * ee4721a1 (tag: 18.40.1) Merge branch 'suppressionNumAdhCTP' into 'homol2' 
> | |\ 
> | | * 52aeaf64 Retire le numéro de contrat et le numéro d'adhérent qui 
> étaient affichés en haut de la page et empiétaient sur le préimprimé. 
> | |/ 
> | * 9f68ec4b (tag: 18.40.0) Merge branch 'cherry-pick-7c956b5f' into 'homol2' 

Checking the log of that file again: 

> $ git log app/CTPCB-AC001/CTPCB-AC001.tex 
> commit 7c956b5f29bf23c624684c9300a13abecfb451c5 
> Author: arnavander < arnavan...@mnt.fr > 
> Date: Mon Nov 5 15:46:27 2018 +0100 
> 
> Correction retour anomalie de Carte TP 
> 
> Signed-off-by: arnavander < arnavan...@mnt.fr > 

Commit 52aeaf64c808c1e3ee8c5cbf5f0221d4e8a7699d does not show up any more in 
the pointed to by commit 25ca67a9! 

The fix only deleted a block of lines: 

> $ git diff 9f68ec4b 52aeaf64 
> diff --git a/app/CTPCB-AC001/CTPCB-AC001.tex 
> b/app/CTPCB-AC001/CTPCB-AC001.tex 
> index 148b3ed2..741d3dff 100644 
> --- a/app/CTPCB-AC001/CTPCB-AC001.tex 
> +++ b/app/CTPCB-AC001/CTPCB-AC001.tex 
> @@ -122,12 +122,6 @@ 
> } 
> 
> \newcommand{\CarteTP}{ 
> - \begin{textblock*}{200pt}(50pt,225pt)% 
> - \No contrat : \NUMADH\\ 
> - \ifdefined\NUMABA 
> - \No adhérent : \NUMABA\\ 
> - \fi 
> - \end{textblock*}% 
> \edToAddrList{DADDR} 
> 
> %PREMIERE CARTE 

How a commit can disapear like that from the log of a file and of course not 
patching the file while staying visible in the git log of the branch? 
It's just like if grandchildren forgetted grandparent commit while staying 
connected 
to parents... 

--
Fan2linux - G. Ballin


A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others

2018-12-06 Thread Ævar Arnfjörð Bjarmason
Let's ignore how bad this patch is for git.git, and just focus on how
diff.colorMoved treats it:

diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..d1155322ef 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -6,5 +6,3 @@
 #include "cache.h"
-#include "config.h"
 #include "builtin.h"
-#include "lockfile.h"
 #include "dir.h"
diff --git a/builtin/am.c b/builtin/am.c
index 8f27f3375b..eded15aa8a 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -6,3 +6,2 @@
 #include "cache.h"
-#include "config.h"
 #include "builtin.h"
diff --git a/builtin/blame.c b/builtin/blame.c
index 06a7163ffe..44a754f190 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -8,3 +8,2 @@
 #include "cache.h"
-#include "config.h"
 #include "color.h"
diff --git a/cache.h b/cache.h
index ca36b44ee0..ea8d60b94a 100644
--- a/cache.h
+++ b/cache.h
@@ -4,2 +4,4 @@
 #include "git-compat-util.h"
+#include "config.h"
+#include "new.h"
 #include "strbuf.h"

This is a common thing that's useful to have highlighted, e.g. we move
includes of config.h to some common file, so I want to se all the
deleted config.h lines as moved into the cache.h line, and then the
"lockfile.h" I removed while I was at it plain remove, and the new
"new.h" plain added.

Exactly that is what you get with diff.colorMoved=plain, but the default
of diff.colorMoved=zebra gets confused by this and highlights no moves
at all, same or "blocks" and "dimmed-zebra".

So at first I thought this had something to do with the many->one
detection, but it seems to be simpler, we just don't detect a move of
1-line with anything but plain, e.g. this works as expected in all modes
and detects the many->one:

diff --git a/builtin/add.c b/builtin/add.c
index f65c172299..f4fda75890 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -5,4 +5,2 @@
  */
-#include "cache.h"
-#include "config.h"
 #include "builtin.h"
diff --git a/builtin/branch.c b/builtin/branch.c
index 0c55f7f065..52e39924d3 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -7,4 +7,2 @@

-#include "cache.h"
-#include "config.h"
 #include "color.h"
diff --git a/cache.h b/cache.h
index ca36b44ee0..d4146dbf8a 100644
--- a/cache.h
+++ b/cache.h
@@ -3,2 +3,4 @@

+#include "cache.h"
+#include "config.h"
 #include "git-compat-util.h"

So is there some "must be at least two consecutive lines" condition for
not-plain, or is something else going on here?


Re: [PATCH 2/2] commit-graph: fix buffer read-overflow

2018-12-06 Thread Derrick Stolee

On 12/5/2018 5:32 PM, Josh Steadmon wrote:
  
+		if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > data + graph_size) {

+   error(_("chunk lookup table entry missing; graph file may be 
incomplete"));
+   free(graph);
+   return NULL;
+   }


Something I forgot earlier: there are several tests in 
t5318-commit-graph.sh that use 'git commit-graph verify' to ensure we 
hit these error conditions on a corrupted commit-graph file. Could you 
try adding a test there that looks for this error message?


Thanks,
-Stolee


[PATCH] docs: fix $strict_export text in gitweb.conf.txt

2018-12-06 Thread Denis Ovsienko
The section used to discussed $gitweb_export_ok and $gitweb_list, but
gitweb Perl code does not have such variables (this likely hangs over
from GITWEB_EXPORT_OK and GITWEB_LIST respectively). Fix the section to
spell $export_ok and $projects_list like the rest of the document.

Signed-off-by: Denis Ovsienko 
---
 Documentation/gitweb.conf.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index c0a326e38..83b4388c2 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -207,8 +207,8 @@ subsection on linkgit:gitweb[1] manpage.
 
 $strict_export::
Only allow viewing of repositories also shown on the overview page.
-   This for example makes `$gitweb_export_ok` file decide if repository is
-   available and not only if it is shown.  If `$gitweb_list` points to
+   This for example makes `$export_ok` file decide if repository is
+   available and not only if it is shown.  If `$projects_list` points to
file with list of project, only those repositories listed would be
available for gitweb.  Can be set during building gitweb via
`GITWEB_STRICT_EXPORT`.  By default this variable is not set, which
-- 
2.17.1




Re: [PATCH v2] l10n: update German translation

2018-12-06 Thread phillip

Hi,

thanks for your great work! Just two remarks:


  #: midx.c:407
-#, fuzzy, c-format
+#, c-format
  msgid "failed to add packfile '%s'"
-msgstr "Fehler beim Lesen der Reihenfolgedatei '%s'."
+msgstr "Fehler beim Hinzufügen von Packdatei'%s'."


A Space is missing: "Fehler beim Hinzufügen von Packdatei '%s'."


  #: run-command.c:1229
-#, fuzzy, c-format
+#, c-format
  msgid "cannot create async thread: %s"
-msgstr "kann Thread nicht erzeugen: %s"
+msgstr "Kann Thread für async nicht erzeugen: %s"


I think we should use "Konnte" here.


Best regards,

Phillip



[no subject]

2018-12-06 Thread ok




--
ATTENTION
I want you to contact
Fast Courier Express & Logistics
so that swill send you the ATM card fast. mind you that as soon as you
received your ATM card you have to be taking 5000 dollars every day 
until the
total amount of your fund which is 3,8 million dollars okay contact
them they will also give you ,your
tracking so that you will know where your card is okay
Fast Courier Express & Logistics
Office Address: 223 CAVALRY
06 BP 700 Cotonou, Republic of Benin
E-mail: {fastercour...@yahoo.com}Website: www.fastcourierexpress.org
Telephone: +1(601)516-5047
THANK YOU


--



Re: Any way to make git-log to enumerate commits?

2018-12-06 Thread Konstantin Khomoutov
On Thu, Dec 06, 2018 at 09:31:36AM +0900, Junio C Hamano wrote:

> >> It would be great if git-log has a formatting option to insert an
> >> index of the current commit since HEAD.
> >> 
> >> It would allow after quitting the git-log to immediately fire up "git
> >> rebase -i HEAD~index" instead of "git rebase -i
> >> go-copy-paste-this-long-number-id".
> >
> > This may have little sense in a general case as the history maintained
> > by Git is a graph, not a single line. Hence your prospective approach
> > would only work for cases like `git log` called with the
> > "--first-parent" command-line option.
> 
> I do not see why the "name each rev relative to HEAD" formatting
> option cannot produce HEAD^2~2 etc.
> 
> It would be similar to "git log | git name-rev --stdin" but I do not
> offhand recall if we had a way to tell name-rev to use only HEAD as
> the anchoring point.

My reading was that the OP explicitly wanted to just glance at a single
integer number and use it right away in a subsequent rebase command.

I mean, use one's own short memory instead of copying and pasting.

The way I decided to format the reference in my sketch script — using
HEAD~ — is just a byproduct of the fact I was aware both of the
"gitrevisions" manual page and the fact `git name-rev` exists (though I
regretfully was not aware it's able to process a stream of `git log`).

Hence while getting fancy names for revisions would be technically
correct but less error-prone for retyping from memory ;-)



Bug: git add --patch does not honor "diff.noprefix"

2018-12-06 Thread Christian Weiske
Hi,


When running "git add -p" on git version 2.19.2 and "diff.noprefix" set
to true, it still shows the "a/" and "b/" prefixes.

This issue has been reported in 2016 already[1], but is still there in
2.19.2.


[1]
https://public-inbox.org/git/e1d7329a-a54b-4d09-a72a-62eca8005...@gmail.com/T/

-- 
Regards/Mit freundlichen Grüßen
Christian Weiske

-=≡ Geeking around in the name of science since 1982 ≡=-


[ANNOUNCE] Git Contributor Summit Registration, Jan 31, 2019, Brussels

2018-12-06 Thread Jeff King
Registration is now open for the Contributor Summit at Git Merge. To
recap from my earlier announcement[1]:

  When: Thursday, January 31, 2019. 10am-5pm.
  Where: The Egg[2], Brussels, Belgium
  What: Round-table discussion about Git
  Who: All contributors to Git or related projects in the Git ecosystem
   are invited; if you're not sure if you qualify, please ask!

Registering for the contributor summit requires a special code. Please
email me off-list to get one.

As with past years, the code will unlock a ticket for both the contrib
summit _and_ the main conference (on Friday, Feb 1st). So don't register
separately for the main conference. Also, as in past years, there are
two variants: a free ticket, and a €99 one. You are free to use either;
if you choose to pay, the money goes entirely to the Software Freedom
Conservancy.

If you'd like to come but need financial assistance with travel costs,
please reach out to the Git PLC at g...@sfconservancy.org. And please do
so soon (let's say by Dec 13th), so we can make decisions and people can
book their travel.

-Peff

[1] https://public-inbox.org/git/20181109104202.ga8...@sigill.intra.peff.net/

[2] This is the same venue as 2017: https://goo.gl/maps/E36qCGJhK8J2


Re: git, monorepos, and access control

2018-12-06 Thread Jeff King
On Thu, Dec 06, 2018 at 10:17:24AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > The other major user of that feature I can think of is LFS. There Git
> > ends up diffing the LFS pointers, not the big files. Which arguably is
> > the wrong thing (you'd prefer to see the actual file contents diffed),
> > but I think nobody cares in practice because large files generally don't
> > have readable diffs anyway.
> 
> I don't use this either, but I can imagine people who use binary files
> via clean/smudge would be well served by dumping out textual metadata of
> the file for diffing instead of showing nothing.
> 
> E.g. for a video file I might imagine having lines like:
> 
> duration-seconds: 123
> camera-model: Shiny Thingamabob
> 
> Then when you check in a new file your "git diff" will show (using
> normal diff view) that:
> 
>- duration-seconds: 123
>+ duration-seconds: 321
> camera-model: Shiny Thingamabob

I think that's orthogonal to clean/smudge, though. Neither the in-repo
nor on-disk formats are going to show that kind of output. For that
you'd want a separate textconv filter (and fun fact: showing exif data
was actually the original use case for which I wrote textconv).

If you are using something like LFS, using textconv on top is a little
trickier, because we'd always feed the filter the LFS pointer file, not
the actual data contents. Doing the "reversal" that Junio suggested
would fix that. Or with the code as it is, you can simply define your
filter to convert the LFS pointer data into the real content. I don't
really use LFS, but it looks like:

  [diff "mp4"]
  textconv = git lfs smudge | extract-metadata

would probably work.

-Peff


Re: git, monorepos, and access control

2018-12-06 Thread Ævar Arnfjörð Bjarmason


On Thu, Dec 06 2018, Jeff King wrote:

> On Thu, Dec 06, 2018 at 10:08:57AM +0900, Junio C Hamano wrote:
>
>> Jeff King  writes:
>>
>> > In my opinion this feature is so contrary to Git's general assumptions
>> > that it's likely to create a ton of information leaks of the supposedly
>> > protected data.
>> > ...
>>
>> Yup, with s/implemented/designed/, I agree all you said here
>> (snipped).
>
> Heh, yeah, I actually scratched my head over what word to use. I think
> Git _could_ be written in a way that is both compatible with existing
> repositories (i.e., is still recognizably Git) and is careful about
> object access control. But either way, what we have now is not close to
> that.
>
>> > Sorry I don't have a more positive response. What you want to do is
>> > perfectly reasonable, but I just think it's a mismatch with how Git
>> > works (and because of the security impact, one missed corner case
>> > renders the whole thing useless).
>>
>> Yup, again.
>>
>> Storing source files encrypted and decrypting with smudge filter
>> upon checkout (and those without the access won't get keys and will
>> likely to use sparse checkout to exclude these priviledged sources)
>> is probably the only workaround that does not involve submodules.
>> Viewing "diff" and "log -p" would still be a challenge, which
>> probably could use the same filter as smudge for textconv.
>
> I suspect there are going to be some funny corner cases there. I use:
>
>   [diff "gpg"]
>   textconv = gpg -qd --no-tty
>
> which works pretty well, but it's for files which are _never_ decrypted
> by Git. So they're encrypted in the working tree too, and I don't use
> clean/smudge filters.
>
> If the files are already decrypted in the working tree, then running
> them through gpg again would be the wrong thing. I guess for a diff
> against the working tree, we would always do a "clean" operation to
> produce the encrypted text, and then decrypt the result using textconv.
> Which would work, but is rather slow.
>
>> I wonder (and this is the primary reason why I am responding to you)
>> if it is common enough wish to use the same filter for smudge and
>> textconv?  So far, our stance (which can be judged from the way the
>> clean/smudge filters are named) has been that the in-repo
>> representation is the canonical, and the representation used in the
>> checkout is ephemeral, and that is why we run "diff", "grep",
>> etc. over the in-repo representation, but the "encrypted in repo,
>> decrypted in checkout" abuse would be helped by an option to do the
>> reverse---find changes and look substrings in the representation
>> used in the checkout.  I am not sure if there are other use cases
>> that is helped by such an option.
>
> Hmm. Yeah, I agree with your line of reasoning here. I'm not sure how
> common it is. This is the first I can recall it. And personally, I have
> never really used clean/smudge filters myself, beyond some toy
> experiments.
>
> The other major user of that feature I can think of is LFS. There Git
> ends up diffing the LFS pointers, not the big files. Which arguably is
> the wrong thing (you'd prefer to see the actual file contents diffed),
> but I think nobody cares in practice because large files generally don't
> have readable diffs anyway.

I don't use this either, but I can imagine people who use binary files
via clean/smudge would be well served by dumping out textual metadata of
the file for diffing instead of showing nothing.

E.g. for a video file I might imagine having lines like:

duration-seconds: 123
camera-model: Shiny Thingamabob

Then when you check in a new file your "git diff" will show (using
normal diff view) that:

   - duration-seconds: 123
   + duration-seconds: 321
camera-model: Shiny Thingamabob

etc.


Git fetch prints unchanged branches‏

2018-12-06 Thread stav alfi
Hi,

When running `git fetch` it returns every time the same 100+ branches
that didn't change at all but still specifies them as new branches in
the server. It also prints the branches that did change.

I don't see this behavior in other repositories I contribute. how do I fix it?

The same output of `git fetch` multiple times:



From github.com:Stratoscale/mui
+ 499a9ae65...63b55f08e Itai/ui-2837-validate-user-name  ->
origin/Itai/ui-2837-validate-user-name  (forced update)
* [new branch]  1-> origin/1
* [new branch]  2-> origin/2
* [new branch]  3-> origin/3
...


Thanks