Re: [PATCH] compat: Allow static initializer for pthreads on Windows

2016-10-26 Thread Johannes Sixt

Am 26.10.2016 um 23:57 schrieb Stefan Beller:

In Windows it is not possible to have a static initialized mutex as of
now, but that seems to be painful for the upcoming refactoring of the
attribute subsystem, as we have no good place to put the initialization
of the attr global lock.


Please rewrite the attribute system such that it can have a dynamic 
initialization. If you find a global initialization in main() too gross 
(I would agree) then setup_git_directory() might be the right place.


-- Hannes



Re: "git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Junio C Hamano
Peter Williams  writes:

> However, for git commands such as diff/status whose job is to display
> information it would be nice if they had a --recursive option to
> override the default submodule diff/status and show details of the
> changes in the submodules.  Sometimes you want to see the big picture
> in detail.

I won't disagree. My comment was only on this part from the original:

>> - We have to make separate commits and manage corresponding topic
>> branches for the superproject and subprojects.

and on this point, we seem to be in agreement.



Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> Yeah, I can make it work without exposing struct git_attr.
>
> You completely misunderstood me.  "struct git_attr" MUST be visible
> to the users so that they can ask for the name in git_check.attr[0].
>
> What would be nice to hide if you can is the function to intern a
> string into a pointer to struct git_attr, i.e. git_attr() function.

Even though that was not the primary point of my suggestion, I
actually think it is OK to make "struct git_attr" a structure that
is opaque to the users of the API if you wanted to.  The attr_check
structure will have an array of pointers to "struct git_attr", and
the structure definition may be visible to the public attr.h header,
but the API users won't have to be able to dereference the pointer
"struct git_attr *".  git_check.attr[0] would be a pointer to an
opaque structure from API users' point of view, that can be passed
to API function git_attr_name() to read its string.

What is nice to hide is the constructor of the structure.  What it,
i.e. "struct git_attr *git_attr(const char *)", needs to do is to
(1) see if the attribute object with the same name already exists in
the table of "all known attributes in the universe", and if there
is, return that instance, (2) otherwise create a new attribute
object, register it to the table and return it.  And it needs to do
it in a way that is thread-safe.

If we have to give access to it to the API users, then we'd need to
acquire and release the Big Attr Lock per each call.  

The calls to git_attr() you need to make in your implementation will
be made from two codepaths:

 * check_initl() acquires the Big Attr Lock, creates a check struct,
   makes multiple calls to git_attr() to construct the necessary
   git_attr instances to fill the array and then releases the lock,
   so the git_attr() constructor does not have to be protected for
   concurrent access.

 * check_attr() acquires the Big Attr Lock, calls down to
   prepare_attr_stack() as necessary to parse .gitattributes files
   found in the directory hierarchy, which makes calls to git_attr()
   to record the attributes found in the file.  Then it does the
   matching to fill results[] array and releases the lock.  Again,
   git_attr() constructors are called under the lock, so there is no
   need for a separate lock.

If these are the only callpaths that reach git_attr() to construct
new attribute objects, it would mean that you can make this private
to attr subsystem and hide it from the users of the API.

Otherwise, you would need to rename the git_attr() constructor that
used internally under the Big Lock to

static struct git_attr *git_attr_locked(const char *);

that is defined inside attr.c, and then provide the external version
as a thin wrapper that calls it under the Big Lock, i.e.

struct git_attr *git_attr(const char *s)
{
struct git_attr *attr;
take_big_attr_lock();
attr = git_attr_locked(s);
release_big_attr_lock();
return attr;
}

That will have to make the big attr lock busier, and it would be
good if we can avoid it.  That is where my "can we hide git_attr()
constructor?" comes from.



Re: "git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Peter Williams

On 27/10/16 09:59, Junio C Hamano wrote:

Stefan Beller  writes:


- We have to make separate commits and manage corresponding topic
branches for the superproject and subprojects.


Well yeah, that is how submodule work on a conceptual level.
While having multiple commits may seem like overhead, note
the subtle difference for these commits. One if deep down in the
stack patching one of the submodules, the other is a high level
commit advancing the submodule pointer.

Note that the target audience of these two commit messages
might be vastly different, hence can be worded differently.
(The submodule describing how you fixed e.g. a memleak or race condition
and the superproject describes on why you needed to include that submodule,
e.g. because you switched your toplevel application to use threads.)


Both good points.

Another thing to keep in mind is that in a well-organized project,
it is expected that you would have multiple commits in a submodule,
solving one single issue that is needed by the superproject in a
finer grained way, before the resulting submodule tip is recorded in
the tree of the superproject in one commit.  IOW, between the time
the superproject's history moves by one commit, the submodule may
have multiple commits in order for the submodule to become ready to
be consumed by the superproject.




I'm a relatively new user of submodules and I quite like them (having 
tried a few other strategies for sharing common code between multiple 
projects and found them quite painful) and find them fairly easy to use. 
 I especially like the fact that the submodule command isn't very 
complicated and that the best method for managing commits, etc in the 
submodule is to cd into their root directory and then treat them like 
any other git repository (greatly reducing the amount of new stuff that 
you have to learn in order to use them).  Also, from my experience so 
far, I see three different types of work going on within my workspaces 
that include submodules:


1. I'm working on changes to the submodule and using the superproject 
that it's checked out in to test those changes in which case most of the 
change is occurring in the submodule with changes in the superproject 
usually being small one related to API changes in the submodule.


2. I'm working on changes in the superproject and the only changes that 
get made in the submodules are to fix bugs uncovered by the work in the 
superproject.


3. I'm modifying a superproject to accommodate changes to a submodule 
that's changed as a result of having changes pulled from another repository.


In none of these cases do I feel the desire/need to commit the changes 
to the superproject and submodule(s) with a single commit command which 
more or less agrees with your points.


However, for git commands such as diff/status whose job is to display 
information it would be nice if they had a --recursive option to 
override the default submodule diff/status and show details of the 
changes in the submodules.  Sometimes you want to see the big picture in 
detail.




Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> Yeah, I can make it work without exposing struct git_attr.

You completely misunderstood me.  "struct git_attr" MUST be visible
to the users so that they can ask for the name in git_check.attr[0].

What would be nice to hide if you can is the function to intern a
string into a pointer to struct git_attr, i.e. git_attr() function.



Re: "git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Matt McCutchen
On Wed, 2016-10-26 at 19:03 -0700, Stefan Beller wrote:
> On Wed, Oct 26, 2016 at 6:52 PM, Matt McCutchen  
> wrote:
> > 4. I pushed several dangling submodule pointers before I learned I
> > could set push.recurseSubmodules = check.  This isn't the default; each
> > developer has to do it manually.  (In theory, I could put such things
> > in a setup script for them to run if they trust me.)
> 
> There is a current series in flight/for review that makes "check" default.
> (It is blocked as check has some performance issues when having lots
> of commits to be pushed, so it may take a while and not show up in the
> next release)

Great!  One other thing: IIRC, "check" does not distinguish between
different remotes.  For example, suppose I fork a project that already
has a submodule and I have a pair of repositories that pull from the
"upstream" repositories and push to "origin" repositories for my
project.  Suppose I upgrade to a new upstream version and find that I'm
(temporarily) able to use the upstream submodule without modifications.
 The "check" feature won't stop me from pushing a pointer into the
"origin" superproject that points to a commit that exists in the
"upstream" subproject but not the "origin" subproject.

> > 5. Stashing changes to both the superproject and the subproject takes
> > more steps.
> 
> True, so you'd want to have a `git stash --recurse-submodules={yes,no}`
> where the command line option is configurable, so you don't have to type
> it all the time?

Sounds good.  I'm sure you realize this is not just a matter of running
"git stash" in each submodule because there are many ways the stash
stacks could get out of lockstep.  The submodule content needs to be
incorporated into the superproject stash.

> Thanks for pointing out the issues though. they align to what
> we plan on doing for submodules, so ... the plan actually makes
> sense :)

Again, I'm thrilled you're working on this, even if I don't use it on
my current project.

Matt


Re: Expanding Includes in .gitignore

2016-10-26 Thread Stefan Beller
> The use case for this is where I did not write my own rules, but I want
> to keep them updated. https://github.com/github/gitignore is a damn good
> resource, but I want to pull it and include relevant bits project by
> project and/or system wide. I don't want to have to update many projects
> manually if that, or any other, repo changes.

.git/info/exclude could be a (sym)link to an up to date version
of the gitignore repo as a hack?


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 5:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>  extern struct git_attr *git_attr(const char *);
>> ...
>> +extern void git_attr_check_append(struct git_attr_check *,
>> +   const struct git_attr *);
>
> Another thing.  Do we still need to expose git_attr() to the outside
> callers?  If we change git_attr_check_append() to take "const char *"
> as its second parameter, can we retire it from the public interface?
>
> It being an "intern" function, by definition it is not thread-safe.
> Its use from prepare_attr_stack() inside git_check_attr() that takes
> the Big Attribute Subsystem Lock is safe, but the callers that do
>
> struct git_attr_check *check = ...;
> struct git_attr *text_attr = git_attr("text");
>
> git_attr_check_append(check, text_attr);
>
> would risk racing to register the same entry to the "all the
> attributes in the known universe" table.
>
> If the attribute API does not have to expose git_attr(const char *)
> to the external callers at all, that would be ideal.  Otherwise, we
> would need to rename the current git_attr() that is used internally
> under the Big Lock to
>
> static struct git_attr *git_attr_locked(const char*)
>
> that is defined inside attr.c, and then provide the external version
> as a thin wrapper that calls it under the Big Lock.
>
>

Yeah, I can make it work without exposing struct git_attr.
However then the struct git_attr_check will contain more of
internals exposed. (As the header file did not know the size
of git_attr, the patch under discussion even must use a double pointer
to a git_attr inside the git_attr_check as the git_attr size is unknown)

So I'll look into removing that struct git_attr completely.


Re: "git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 6:52 PM, Matt McCutchen  wrote:
> Hi Stefan,
>
> I appreciate the effort to remove obstacles to the use of submodules!
>  It looks like a custom tool is probably still our best option at this
> time, though we can always switch back to submodules later.
>
> On Wed, 2016-10-26 at 16:23 -0700, Stefan Beller wrote:
>> On Wed, Oct 26, 2016 at 4:07 PM, Matt McCutchen  
>> wrote:
>> > - We have to make separate commits and manage corresponding topic
>> > branches for the superproject and subprojects.
>>
>> Well yeah, that is how submodule work on a conceptual level.
>> While having multiple commits may seem like overhead, note
>> the subtle difference for these commits. One if deep down in the
>> stack patching one of the submodules, the other is a high level
>> commit advancing the submodule pointer.
>>
>> Note that the target audience of these two commit messages
>> might be vastly different, hence can be worded differently.
>> (The submodule describing how you fixed e.g. a memleak or race condition
>> and the superproject describes on why you needed to include that submodule,
>> e.g. because you switched your toplevel application to use threads.)
>
> I understand one can adhere to that philosophy, but I don't see the
> practical benefit of doing so in our case compared to using a "git
> subtree"-like approach and making a single commit.  It would just be us
> looking at both commits.  If we do upstream any of the library changes,
> we'll probably have to rework them anyway.
>
>> > - A diff of the superproject doesn't include the content of
>> > subprojects.
>
>> Although this is just Git, you probably also have a code review system that
>> would need that change as well.
>
> Indeed.  We currently use Bitbucket.  I'd be open to switching, though
> maybe not just for this.
>
>> Is there anything else besides these 3 points that encourages you to
>> switch away from submodules?
>
> Those 3 are the ongoing pain points I can think of.  There are a few
> other drawbacks compared to "git subtree" that come up less often:
>
> 1b. On another project, I was working with a teammate who was new to
> version control and not very careful, who forgot to run "git submodule
> update" and ended up committing back the old submodule pointer.
>  Thankfully, this hasn't happened yet on my current project.
>
> 4. I pushed several dangling submodule pointers before I learned I
> could set push.recurseSubmodules = check.  This isn't the default; each
> developer has to do it manually.  (In theory, I could put such things
> in a setup script for them to run if they trust me.)

There is a current series in flight/for review that makes "check" default.
(It is blocked as check has some performance issues when having lots
of commits to be pushed, so it may take a while and not show up in the
next release)

>
> 5. Stashing changes to both the superproject and the subproject takes
> more steps.

True, so you'd want to have a `git stash --recurse-submodules={yes,no}`
where the command line option is configurable, so you don't have to type
it all the time?

>
> 6. I use multiple worktrees (with "git worktree") because of #5 and
> also so that I can run two versions of the application at the same time
> and compare the behavior.  Using "git worktree" with submodules is
> officially unsupported, though I was able to get things working by
> manually editing some files.

Heh, true. I made an attempt on fixing git worktree a few weeks ago, but that
did not go anywhere, but it's still on the TODO list.

You can use git clone --reference with --recurse-submodule though when
having bandwidth concerns. It doesn't deliver the full worktree experience
though as it would be separate clones with shared object store.

>
> 7. We have to set up a repository for each subproject on our hosting
> service.  Anyone who forks our application and modifies a subproject
> has to set up a subproject repository and carry a change to .gitmodules
> to point to their repository.  If we use relative URLs in .gitmodules,
> it's even worse: anyone who forks our application has to set up
> repositories for all the subprojects, even if they don't modify them.
>
> Matt

Yeah the model of referencing in gitmodules is conceptually broken.
I don't even claim it is on my todo list. ;)

Thanks for pointing out the issues though. they align to what
we plan on doing for submodules, so ... the plan actually makes
sense :)

Thanks,
Stefan


Re: "git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Matt McCutchen
Hi Stefan,

I appreciate the effort to remove obstacles to the use of submodules!
 It looks like a custom tool is probably still our best option at this
time, though we can always switch back to submodules later.

On Wed, 2016-10-26 at 16:23 -0700, Stefan Beller wrote:
> On Wed, Oct 26, 2016 at 4:07 PM, Matt McCutchen  
> wrote:
> > - We have to make separate commits and manage corresponding topic
> > branches for the superproject and subprojects.
> 
> Well yeah, that is how submodule work on a conceptual level.
> While having multiple commits may seem like overhead, note
> the subtle difference for these commits. One if deep down in the
> stack patching one of the submodules, the other is a high level
> commit advancing the submodule pointer.
> 
> Note that the target audience of these two commit messages
> might be vastly different, hence can be worded differently.
> (The submodule describing how you fixed e.g. a memleak or race condition
> and the superproject describes on why you needed to include that submodule,
> e.g. because you switched your toplevel application to use threads.)

I understand one can adhere to that philosophy, but I don't see the
practical benefit of doing so in our case compared to using a "git
subtree"-like approach and making a single commit.  It would just be us
looking at both commits.  If we do upstream any of the library changes,
we'll probably have to rework them anyway.

> > - A diff of the superproject doesn't include the content of
> > subprojects.

> Although this is just Git, you probably also have a code review system that
> would need that change as well.

Indeed.  We currently use Bitbucket.  I'd be open to switching, though
maybe not just for this.

> Is there anything else besides these 3 points that encourages you to
> switch away from submodules?

Those 3 are the ongoing pain points I can think of.  There are a few
other drawbacks compared to "git subtree" that come up less often:

1b. On another project, I was working with a teammate who was new to
version control and not very careful, who forgot to run "git submodule
update" and ended up committing back the old submodule pointer.
 Thankfully, this hasn't happened yet on my current project.

4. I pushed several dangling submodule pointers before I learned I
could set push.recurseSubmodules = check.  This isn't the default; each
developer has to do it manually.  (In theory, I could put such things
in a setup script for them to run if they trust me.)

5. Stashing changes to both the superproject and the subproject takes
more steps.

6. I use multiple worktrees (with "git worktree") because of #5 and
also so that I can run two versions of the application at the same time
and compare the behavior.  Using "git worktree" with submodules is
officially unsupported, though I was able to get things working by
manually editing some files.

7. We have to set up a repository for each subproject on our hosting
service.  Anyone who forks our application and modifies a subproject
has to set up a subproject repository and carry a change to .gitmodules
to point to their repository.  If we use relative URLs in .gitmodules,
it's even worse: anyone who forks our application has to set up
repositories for all the subprojects, even if they don't modify them.

Matt


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> To explain, you can either have:
> struct git_attr_result result[2];
> or
> struct git_attr_result *result = git_attr_result_alloc(check);
> and both are running just fine in a thread. However you should not
> make that variable static. But maybe that is too much common sense
> and hence confusing.

Yup, if you spelled it out that does make sense, but the description
given in the patch was just misleading.

Thanks.


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

>  extern struct git_attr *git_attr(const char *);
> ...
> +extern void git_attr_check_append(struct git_attr_check *,
> +   const struct git_attr *);

Another thing.  Do we still need to expose git_attr() to the outside
callers?  If we change git_attr_check_append() to take "const char *"
as its second parameter, can we retire it from the public interface?

It being an "intern" function, by definition it is not thread-safe.
Its use from prepare_attr_stack() inside git_check_attr() that takes
the Big Attribute Subsystem Lock is safe, but the callers that do

struct git_attr_check *check = ...;
struct git_attr *text_attr = git_attr("text");

git_attr_check_append(check, text_attr);

would risk racing to register the same entry to the "all the
attributes in the known universe" table.  

If the attribute API does not have to expose git_attr(const char *)
to the external callers at all, that would be ideal.  Otherwise, we
would need to rename the current git_attr() that is used internally
under the Big Lock to 

static struct git_attr *git_attr_locked(const char*)

that is defined inside attr.c, and then provide the external version
as a thin wrapper that calls it under the Big Lock.




Expanding Includes in .gitignore

2016-10-26 Thread Aaron Pelly
I want a feature. It may be a bad-idea(tm). Advice appreciated.

I want git to be able to include, in its gitignore files, sub-files of
ignores or have it understand a directory of ignore files. Or both.

The use case for this is where I did not write my own rules, but I want
to keep them updated. https://github.com/github/gitignore is a damn good
resource, but I want to pull it and include relevant bits project by
project and/or system wide. I don't want to have to update many projects
manually if that, or any other, repo changes.

A very brief look at dir.c would indicate that a recursive call from
add_excludes to itself when it parses some sort of include tag would do
it within a file. I'm sure it'd be pretty straight forward to hook into
something in dir.c to parse directories too.

I'm thinking something like ". path/to/include/file" in an ignore file,
and/or creating .gitignore.d and/or allowing $HOME/.config/git/ignore
and $GIT_DIR/info/exclude to be directories. Or some sane and consistent
mixture of these things.

In the case of a directory the plan would be to add links to files
stored/sourced elsewhere. This does pose a precedence question which I
haven't thought about yet, but probably makes it too hard for the
limited value it brings.

There is also the issue of malicious/accidental recursion which I
haven't thought about deeply either.

I would like to know the desirability/practicality/stupidity of such a
feature as I believe it is within my skillset to implement it.


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 5:16 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +* Allocate an array of `struct git_attr_result` either on the stack
>> +  or via `git_attr_result_alloc` on the heap when the result size
>> +  is not known at compile time. The call to initialize
>>the result is not thread safe, because different threads need their
>>own thread local result anyway.
>
> Do you want to keep the last sentence?  "The call to initialize the
> result is not thread safe..."?  Is that true?

I'll drop that sentence, as it overstates the situation.

To explain, you can either have:
struct git_attr_result result[2];
or
struct git_attr_result *result = git_attr_result_alloc(check);
and both are running just fine in a thread. However you should not
make that variable static. But maybe that is too much common sense
and hence confusing.

>
>> @@ -103,7 +105,7 @@ To see how attributes "crlf" and "ident" are set
>> for different paths.
>>  const char *path;
>>  struct git_attr_result result[2];
>>
>> -git_check_attr(path, check, result);
>> +git_check_attr(path, , result);
>
> What's the point of this change?  Isn't check typically a pointer
> already?

This ought to go to

git_attr_check_initl(, "crlf", "ident", NULL);

instead.

>


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> +* Allocate an array of `struct git_attr_result` either on the stack
> +  or via `git_attr_result_alloc` on the heap when the result size
> +  is not known at compile time. The call to initialize
>the result is not thread safe, because different threads need their
>own thread local result anyway.

Do you want to keep the last sentence?  "The call to initialize the
result is not thread safe..."?  Is that true?

> @@ -103,7 +105,7 @@ To see how attributes "crlf" and "ident" are set
> for different paths.
>  const char *path;
>  struct git_attr_result result[2];
>
> -git_check_attr(path, check, result);
> +git_check_attr(path, , result);

What's the point of this change?  Isn't check typically a pointer
already?



Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 4:14 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> @@ -53,19 +57,32 @@ value of the attribute for the path.
>>  Querying Specific Attributes
>>  
>>
>> -* Prepare `struct git_attr_check` using git_attr_check_initl()
>> +* Prepare a `struct git_attr_check` using `git_attr_check_initl()`
>>function, enumerating the names of attributes whose values you are
>>interested in, terminated with a NULL pointer.  Alternatively, an
>> -  empty `struct git_attr_check` can be prepared by calling
>> -  `git_attr_check_alloc()` function and then attributes you want to
>> -  ask about can be added to it with `git_attr_check_append()`
>> -  function.
>> -
>> -* Call `git_check_attr()` to check the attributes for the path.
>> -
>> -* Inspect `git_attr_check` structure to see how each of the
>> -  attribute in the array is defined for the path.
>> -
>> +  empty `struct git_attr_check` as allocated by git_attr_check_alloc()
>> +  can be prepared by calling `git_attr_check_alloc()` function and
>> +  then attributes you want to ask about can be added to it with
>> +  `git_attr_check_append()` function.
>
> I think that my version that was discarded forbade appending once
> you started to use the check for querying, because the check was
> meant to be used as a key for an attr-stack and the check-specific
> attr-stack was planned to keep only the attrs the check is
> interested in (and appending is to change the set of attrs the check
> is interested in, invalidating the attr-stack at that point).
>
> If you lost that restriction, that is good (I didn't check the
> implementation, though).  Otherwise we'd need to say something here.

That restriction still applies. Though I think mentioning it in the
paragraph where we describe querying makes more sense.

> initialization?

done


>
> Grammo?  "either on the stack, or dynamically in the heap"?

done

>
> Having result defined statically is not thread safe for that
> reason.  It is not clear what you mean by "The call to initialize
> the result"; having it on the stack or have one dynamically on the
> heap ought to be thread safe.

done

>> -}
>> + static struct git_attr_check *check;
>> + git_attr_check_initl(check, "crlf", "ident", NULL);
>
> I think you are still missing "&" here.

done

>> + if (check)
>> + return; /* already done */
>>   check = git_attr_check_alloc();
>
> You may want to say that this is thread-unsafe.

It is not; see the implementation:

void git_attr_check_append(struct git_attr_check *check,
   const struct git_attr *attr)
{
int i;
if (check->finalized)
die("BUG: append after git_attr_check structure is finalized");
if (!attr_check_is_dynamic(check))
die("BUG: appending to a statically initialized git_attr_check");
attr_lock();
for (i = 0; i < check->check_nr; i++)
if (check->attr[i] == attr)
break;
if (i == check->check_nr) {
ALLOC_GROW(check->attr, check->check_nr + 1, check->check_alloc);
check->attr[check->check_nr++] = attr;
}
attr_unlock();
}

>> +* Call `git_all_attrs()`.
>
> Hmph, the caller does not know what attribute it is interested in,
> and it is unclear "how" the former needs to be set up from this
> description.  Should it prepare an empty one that can be appended?
>

Good point on clarifying this one. It is just needed to have
NULL pointers around:

struct git_attr_check *check = NULL;
struct git_attr_result *result = NULL;
git_all_attrs(full_path, _check, );
// proceed as in the case above

All comments from above yield the following diff:

diff --git a/Documentation/technical/api-gitattributes.txt
b/Documentation/technical/api-gitattributes.txt
index 4d8ef93..d221736 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -67,19 +67,21 @@ Querying Specific Attributes
   Both ways with `git_attr_check_initl()` as well as the
   alloc and append route are thread safe, i.e. you can call it
   from different threads at the same time; when check determines
-  the initialzisation is still needed, the threads will use a
+  the initialization is still needed, the threads will use a
   single global mutex to perform the initialization just once, the
   others will wait on the the thread to actually perform the
   initialization.

-* Allocate an array of `struct git_attr_result` either statically on the
-  as a variable on the stack or dynamically via `git_attr_result_alloc`
-  when the result size is not known at compile time. The call to initialize
+* Allocate an array of `struct git_attr_result` either on the stack
+  or via `git_attr_result_alloc` on the heap when the result size
+  is not known at compile time. The call to initialize
   the result is not thread safe, because different threads need their
   own thread local result anyway.

Re: "git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

>> - We have to make separate commits and manage corresponding topic
>> branches for the superproject and subprojects.
>
> Well yeah, that is how submodule work on a conceptual level.
> While having multiple commits may seem like overhead, note
> the subtle difference for these commits. One if deep down in the
> stack patching one of the submodules, the other is a high level
> commit advancing the submodule pointer.
>
> Note that the target audience of these two commit messages
> might be vastly different, hence can be worded differently.
> (The submodule describing how you fixed e.g. a memleak or race condition
> and the superproject describes on why you needed to include that submodule,
> e.g. because you switched your toplevel application to use threads.)

Both good points. 

Another thing to keep in mind is that in a well-organized project,
it is expected that you would have multiple commits in a submodule,
solving one single issue that is needed by the superproject in a
finer grained way, before the resulting submodule tip is recorded in
the tree of the superproject in one commit.  IOW, between the time
the superproject's history moves by one commit, the submodule may
have multiple commits in order for the submodule to become ready to
be consumed by the superproject.




Re: "git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 4:07 PM, Matt McCutchen  wrote:
> I'm the lead developer of a research software application (https://bitb
> ucket.org/objsheets/objsheets) that uses modified versions of two
> third-party libraries, which we need to version and distribute along
> with our application.  For better or for worse, we haven't made it a
> priority to upstream our changes, so for now we just want to optimize
> for ease of (1) making and reviewing changes and (2) upgrading to newer
> upstream versions.
>
> We've been using git submodules, but that's a pain for several reasons:
> - We have to run "git submodule update" manually.

That is true for now. :( But there are plans to revive a patch series
to include updating the submodule into git-checkout.

https://github.com/stefanbeller/git/commits/submodule-co

> - We have to make separate commits and manage corresponding topic
> branches for the superproject and subprojects.

Well yeah, that is how submodule work on a conceptual level.
While having multiple commits may seem like overhead, note
the subtle difference for these commits. One if deep down in the
stack patching one of the submodules, the other is a high level
commit advancing the submodule pointer.

Note that the target audience of these two commit messages
might be vastly different, hence can be worded differently.
(The submodule describing how you fixed e.g. a memleak or race condition
and the superproject describes on why you needed to include that submodule,
e.g. because you switched your toplevel application to use threads.)

> - A diff of the superproject doesn't include the content of
> subprojects.

A recent patch series by Jacob Keller (jk/diff-submodule-diff-inline)
taught git diff to show the diff in the submodule as well, see
commits that are merged in 305d7f133956a5f43c94d938beabbfbb0ac1753c
or:
https://kernel.googlesource.com/pub/scm/git/git/+/fd47ae6a5b9cc0cfc56c1f7c43db612d26ca4b75%5E%21/#F1

Although this is just Git, you probably also have a code review system that
would need that change as well. Also it is not part of any release yet, but
soon will be.

Is there anything else besides these 3 points that encourages you to
switch away from submodules?

Thanks,
Stefan


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> @@ -53,19 +57,32 @@ value of the attribute for the path.
>  Querying Specific Attributes
>  
>  
> -* Prepare `struct git_attr_check` using git_attr_check_initl()
> +* Prepare a `struct git_attr_check` using `git_attr_check_initl()`
>function, enumerating the names of attributes whose values you are
>interested in, terminated with a NULL pointer.  Alternatively, an
> -  empty `struct git_attr_check` can be prepared by calling
> -  `git_attr_check_alloc()` function and then attributes you want to
> -  ask about can be added to it with `git_attr_check_append()`
> -  function.
> -
> -* Call `git_check_attr()` to check the attributes for the path.
> -
> -* Inspect `git_attr_check` structure to see how each of the
> -  attribute in the array is defined for the path.
> -
> +  empty `struct git_attr_check` as allocated by git_attr_check_alloc()
> +  can be prepared by calling `git_attr_check_alloc()` function and
> +  then attributes you want to ask about can be added to it with
> +  `git_attr_check_append()` function.

I think that my version that was discarded forbade appending once
you started to use the check for querying, because the check was
meant to be used as a key for an attr-stack and the check-specific
attr-stack was planned to keep only the attrs the check is
interested in (and appending is to change the set of attrs the check
is interested in, invalidating the attr-stack at that point).  

If you lost that restriction, that is good (I didn't check the
implementation, though).  Otherwise we'd need to say something here.

> +  Both ways with `git_attr_check_initl()` as well as the
> +  alloc and append route are thread safe, i.e. you can call it
> +  from different threads at the same time; when check determines
> +  the initialzisation is still needed, the threads will use a

initialization?

> +  single global mutex to perform the initialization just once, the
> +  others will wait on the the thread to actually perform the
> +  initialization.
> +
> +* Allocate an array of `struct git_attr_result` either statically on the
> +  as a variable on the stack or dynamically via `git_attr_result_alloc`

Grammo?  "either on the stack, or dynamically in the heap"?

> +  when the result size is not known at compile time. The call to initialize
> +  the result is not thread safe, because different threads need their
> +  own thread local result anyway.

Having result defined statically is not thread safe for that
reason.  It is not clear what you mean by "The call to initialize
the result"; having it on the stack or have one dynamically on the
heap ought to be thread safe.

> +* Call `git_check_attr()` to check the attributes for the path,
> +  the given `git_attr_result` will be filled with the result.
> +
> +* Inspect each `git_attr_result` structure to see how
> +  each of the attribute in the array is defined for the path.
>  
>  Example
>  ---
> @@ -76,28 +93,23 @@ To see how attributes "crlf" and "ident" are set for 
> different paths.
>we are checking two attributes):
>  
>  
> -static struct git_attr_check *check;
> -static void setup_check(void)
> -{
> - if (check)
> - return; /* already done */
> - check = git_attr_check_initl("crlf", "ident", NULL);
> -}
> + static struct git_attr_check *check;
> + git_attr_check_initl(check, "crlf", "ident", NULL);

I think you are still missing "&" here.

>  
>  
>  . Call `git_check_attr()` with the prepared `struct git_attr_check`:
>  
>  
>   const char *path;
> + struct git_attr_result result[2];
>  
> - setup_check();
> - git_check_attr(path, check);
> + git_check_attr(path, check, result);
>  
>  
> -. Act on `.value` member of the result, left in `check->check[]`:
> +. Act on `result.value[]`:
>  
>  
> - const char *value = check->check[0].value;
> + const char *value = result.value[0];
>  
>   if (ATTR_TRUE(value)) {
>   The attribute is Set, by listing only the name of the
> @@ -123,12 +135,15 @@ the first step in the above would be different.
>  static struct git_attr_check *check;
>  static void setup_check(const char **argv)
>  {
> + if (check)
> + return; /* already done */
>   check = git_attr_check_alloc();

You may want to say that this is thread-unsafe.

>   while (*argv) {
>   struct git_attr *attr = git_attr(*argv);
>   git_attr_check_append(check, attr);
>   argv++;
>   }
> + struct git_attr_result *result = git_attr_result_alloc(check);
>  }
>  
>  
> @@ -138,17 +153,20 @@ Querying All Attributes
>  
>  To get the values of all attributes associated with a file:
>  
> +* Setup a local variables for the question
> +  `struct git_attr_check` as well as a pointer where the result
> +  `struct git_attr_result` will be stored.
> +* Call 

Re: [PATCH 28/36] attr: keep attr stack for each check

2016-10-26 Thread Stefan Beller
On Sun, Oct 23, 2016 at 8:10 AM, Ramsay Jones
 wrote:
>> +
>> +struct hashmap all_attr_stacks;
>> +int all_attr_stacks_init;
>
> Mark symbols 'all_attr_stacks' and 'all_attr_stacks_init' with
> the static keyword. (ie. these are file-local symbols).
>
> ATB,
> Ramsay Jones
>

This fix will appear in a resend of the series.

Thanks,
Stefan


"git subtree --squash" interacts poorly with revert, merge, and rebase

2016-10-26 Thread Matt McCutchen
I'm the lead developer of a research software application (https://bitb
ucket.org/objsheets/objsheets) that uses modified versions of two
third-party libraries, which we need to version and distribute along
with our application.  For better or for worse, we haven't made it a
priority to upstream our changes, so for now we just want to optimize
for ease of (1) making and reviewing changes and (2) upgrading to newer
upstream versions.

We've been using git submodules, but that's a pain for several reasons:
- We have to run "git submodule update" manually.
- We have to make separate commits and manage corresponding topic
branches for the superproject and subprojects.
- A diff of the superproject doesn't include the content of
subprojects.

Recently I looked into switching to the "git subtree" contrib tool in
the --squash mode, but I identified a few drawbacks compared to
submodules:

1. The upstream commit on which the subtree is based is assumed to be
given by the latest squash commit in "git log".  This means that (i) a
change to a different upstream commit can't be reverted with "git
revert" and (ii) a "git merge" of two superproject branches based on
different upstream commits may successfully merge the content of the
upstream commits but leave the tool thinking the subtree is based on an
arbitrary one of the two commits.

2. Rebasing messes up the merge commits generated by "git subtree --
squash".  --preserve-merges worked in a simple test but supposedly
doesn't work if there are conflicts or I want to reorder commits with
--interactive.

Maybe we would never hit any of these problems in practice, but they
give me a bad enough feeling that I'm planning to write my own tool
that tracks the upstream commit ID in a file (like a submodule) and
doesn't generate any extra commits.  Without generating extra commits,
the only place to store the upstream content in the superproject would
be in another subtree, which would take up disk space in every working
tree unless developers manually set skip-worktree.  I think I prefer to
not store the upstream content and just have the tool fetch it from a
local subproject repository each time it's needed.

I'll of course post the tool on the web and would be happy to see it
integrated into "git subtree" if that makes sense, but I don't know how
much time I'd be willing to put into making that happen.

Any advice?

Thanks,
Matt


Re: What's cooking in git.git (Oct 2016, #07; Wed, 26)

2016-10-26 Thread Junio C Hamano
Jonathan Tan  writes:

> On 10/26/2016 03:29 PM, Junio C Hamano wrote:
>> * jt/trailer-with-cruft (2016-10-21) 8 commits
>>  - trailer: support values folded to multiple lines
>>  - trailer: forbid leading whitespace in trailers
>>  - trailer: allow non-trailers in trailer block
>>  - trailer: clarify failure modes in parse_trailer
>>  - trailer: make args have their own struct
>>  - trailer: streamline trailer item create and add
>>  - trailer: use list.h for doubly-linked list
>>  - trailer: improve const correctness
>>
>>  Update "interpret-trailers" machinery and teaches it that people in
>>  real world write all sorts of crufts in the "trailer" that was
>>  originally designed to have the neat-o "Mail-Header: like thing"
>>  and nothing else.
>>
>>  Waiting for review.
>
> For this patch set, should I look for another reviewer? In the e-mail
> thread [1], you and Stefan seemed positive about this patch set.

Ah, I meant "waiting for the review to conclude".  If nobody has any
more comments, I am OK to start cooking it in 'next'.

Thanks.


[PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
This revamps the API of the attr subsystem to be thread safe.
Before we had the question and its results in one struct type.
The typical usage of the API was

static struct git_attr_check *check;

if (!check)
check = git_attr_check_initl("text", NULL);

git_check_attr(path, check);
act_on(check->value[0]);

This has a couple of issues when it comes to thread safety:

* the initialization is racy in this implementation. To make it
  thread safe, we need to acquire a mutex, such that only one
  thread is executing the code in git_attr_check_initl.
  As we do not want to introduce a mutex at each call site,
  this is best done in the attr code. However to do so, we need
  to have access to the `check` variable, i.e. the API has to
  look like
git_attr_check_initl(struct git_attr_check*, ...);
  Then one of the threads calling git_attr_check_initl will
  acquire the mutex and init the `check`, while all other threads
  will wait on the mutex just to realize they're late to the
  party and they'll return with no work done.

* While the check for attributes to be questioned only need to
  be initalized once as that part will be read only after its
  initialisation, the answer may be different for each path.
  Because of that we need to decouple the check and the answer,
  such that each thread can obtain an answer for the path it
  is currently processing.

This commit changes the API and adds locking in
git_attr_check_initl that provides the thread safety for constructing
`struct git_attr_check`.

The usage of the new API will be:

/*
 * The initl call will thread-safely check whether the
 * struct git_attr_check has been initialized. We only
 * want to do the initialization work once, hence we do
 * that work inside a thread safe environment.
 */
static struct git_attr_check *check;
git_attr_check_initl(, "text", NULL);

/* We're just asking for one attribute "text". */
git_attr_result myresult[1];

/* Perform the check and act on it: */
git_check_attr(path, check, myresult);
act_on(myresult[0].value);

/*
 * No need to free the check as it is static, hence doesn't leak
 * memory. The result is also static, so no need to free there either.
 */

Signed-off-by: Stefan Beller 
---

In the reroll of this series, I plan to use the patch as-is below:

* Documentation/commit message matches actual implementation
* initialize the attr lock statically. It was uninitialized in the first
  version, which works for me on Linux, but not so in Windows; I split off
  the static initialized mutexes on Windows as a separate patch, see
  
https://public-inbox.org/git/cagz79kzryb-5jgif04btk1v9tgfj-tnahquk+1lb7xeecu7...@mail.gmail.com
  which would be a requirement for this patch.

 Documentation/technical/api-gitattributes.txt |  94 ++---
 archive.c |  11 +-
 attr.c| 143 ++
 attr.h|  71 -
 builtin/check-attr.c  |  35 ---
 builtin/pack-objects.c|  16 +--
 convert.c |  40 +++
 ll-merge.c|  24 +++--
 userdiff.c|  16 +--
 ws.c  |   8 +-
 10 files changed, 280 insertions(+), 178 deletions(-)

diff --git a/Documentation/technical/api-gitattributes.txt 
b/Documentation/technical/api-gitattributes.txt
index 92fc32a..4d8ef93 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -16,15 +16,19 @@ Data Structure
of no interest to the calling programs.  The name of the
attribute can be retrieved by calling `git_attr_name()`.
 
-`struct git_attr_check_elem`::
-
-   This structure represents one attribute and its value.
-
 `struct git_attr_check`::
 
-   This structure represents a collection of `git_attr_check_elem`.
+   This structure represents a collection of `struct git_attrs`.
It is passed to `git_check_attr()` function, specifying the
-   attributes to check, and receives their values.
+   attributes to check, and receives their values into a corresponding
+   `struct git_attr_result`.
+
+`struct git_attr_result`::
+
+   This structure represents one results for a check, such that an
+   array of `struct git_attr_results` corresponds to a
+   `struct git_attr_check`. The answers given in that array are in
+   the the same order as the check struct.
 
 
 Attribute Values
@@ -32,7 +36,7 @@ Attribute Values
 
 An attribute for a path can be in one of four states: Set, Unset,
 Unspecified or set to a string, and `.value` member of `struct
-git_attr_check` records it.  There are three macros to check these:
+git_attr_result` records it.  There 

Re: What's cooking in git.git (Oct 2016, #07; Wed, 26)

2016-10-26 Thread Jonathan Tan

On 10/26/2016 03:29 PM, Junio C Hamano wrote:

* jt/trailer-with-cruft (2016-10-21) 8 commits
 - trailer: support values folded to multiple lines
 - trailer: forbid leading whitespace in trailers
 - trailer: allow non-trailers in trailer block
 - trailer: clarify failure modes in parse_trailer
 - trailer: make args have their own struct
 - trailer: streamline trailer item create and add
 - trailer: use list.h for doubly-linked list
 - trailer: improve const correctness

 Update "interpret-trailers" machinery and teaches it that people in
 real world write all sorts of crufts in the "trailer" that was
 originally designed to have the neat-o "Mail-Header: like thing"
 and nothing else.

 Waiting for review.


For this patch set, should I look for another reviewer? In the e-mail 
thread [1], you and Stefan seemed positive about this patch set.


(I've also checked [2] that this is not merged yet.)

[1] 
[2] git log --grep='jt/trailer'


What's cooking in git.git (Oct 2016, #07; Wed, 26)

2016-10-26 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/gitweb-abbrev-links (2016-10-14) 3 commits
  (merged to 'next' on 2016-10-17 at 4868def05e)
 + gitweb: link to "git describe"'d commits in log messages
 + gitweb: link to 7-char+ SHA-1s, not only 8-char+
 + gitweb: fix a typo in a comment

 In addition to purely abbreviated commit object names, "gitweb"
 learned to turn "git describe" output (e.g. v2.9.3-599-g2376d31787)
 into clickable links in its output.


* bw/ls-files-recurse-submodules (2016-10-10) 4 commits
  (merged to 'next' on 2016-10-17 at f0e398946a)
 + ls-files: add pathspec matching for submodules
 + ls-files: pass through safe options for --recurse-submodules
 + ls-files: optionally recurse into submodules
 + git: make super-prefix option

 "git ls-files" learned "--recurse-submodules" option that can be
 used to get a listing of tracked files across submodules (i.e. this
 only works with "--cached" option, not for listing untracked or
 ignored files).  This would be a useful tool to sit on the upstream
 side of a pipe that is read with xargs to work on all working tree
 files from the top-level superproject.


* bw/submodule-branch-dot-doc (2016-10-19) 1 commit
  (merged to 'next' on 2016-10-21 at 18aad25ba8)
 + submodules doc: update documentation for "." used for submodule branches

 Recent git allows submodule..branch to use a special token
 "." instead of the branch name; the documentation has been updated
 to describe it.


* dk/worktree-dup-checkout-with-bare-is-ok (2016-10-14) 1 commit
  (merged to 'next' on 2016-10-17 at 24594d3e56)
 + worktree: allow the main brach of a bare repository to be checked out

 In a worktree connected to a repository elsewhere, created via "git
 worktree", "git checkout" attempts to protect users from confusion
 by refusing to check out a branch that is already checked out in
 another worktree.  However, this also prevented checking out a
 branch, which is designated as the primary branch of a bare
 reopsitory, in a worktree that is connected to the bare
 repository.  The check has been corrected to allow it.


* ex/deprecate-empty-pathspec-as-match-all (2016-06-22) 1 commit
  (merged to 'next' on 2016-09-21 at e19148ea63)
 + pathspec: warn on empty strings as pathspec

 Originally merged to 'next' on 2016-07-13

 An empty string used as a pathspec element has always meant
 'everything matches', but it is too easy to write a script that
 finds a path to remove in $path and run 'git rm "$paht"', which
 ends up removing everything.  Start warning about this use of an
 empty string used for 'everything matches' and ask users to use a
 more explicit '.' for that instead.

 The hope is that existing users will not mind this change, and
 eventually the warning can be turned into a hard error, upgrading
 the deprecation into removal of this (mis)feature.


* jc/cocci-xstrdup-or-null (2016-10-12) 1 commit
  (merged to 'next' on 2016-10-17 at 55ceaa465a)
 + cocci: refactor common patterns to use xstrdup_or_null()

 Code cleanup.


* jc/diff-unique-abbrev-comments (2016-09-30) 1 commit
  (merged to 'next' on 2016-10-17 at c7fb286102)
 + diff_unique_abbrev(): document its assumption and limitation
 (this branch is used by jk/no-looking-at-dotgit-outside-repo.)

 A bit more comments in a tricky code.


* jc/ws-error-highlight (2016-10-04) 4 commits
  (merged to 'next' on 2016-10-17 at ecbdc57d77)
 + diff: introduce diff.wsErrorHighlight option
 + diff.c: move ws-error-highlight parsing helpers up
 + diff.c: refactor parse_ws_error_highlight()
 + t4015: split out the "setup" part of ws-error-highlight test

 "git diff/log --ws-error-highlight=" lacked the corresponding
 configuration variable to set it by default.


* jk/ambiguous-short-object-names (2016-10-12) 1 commit
  (merged to 'next' on 2016-10-19 at e7c55a9da5)
 + t1512: become resilient to GETTEXT_POISON build

 A test fixup to recently graduated topic.


* jk/diff-submodule-diff-inline (2016-10-20) 1 commit
  (merged to 'next' on 2016-10-21 at 13f300805e)
 + rev-list: use hdr_termination instead of a always using a newline

 A recently graduated topic regressed "git rev-list --header"
 output, breaking "gitweb".  This has been fixed.


* jk/fetch-quick-tag-following (2016-10-14) 1 commit
  (merged to 'next' on 2016-10-19 at d7718dcdf5)
 + fetch: use "quick" has_sha1_file for tag following

 When fetching from a remote that has many tags that are irrelevant
 to branches we are following, we used to waste way too many cycles
 when checking if the object 

Re: [PATCH] compat: Allow static initializer for pthreads on Windows

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 2:57 PM, Stefan Beller  wrote:
> In Windows it is not possible to have a static initialized mutex as of
> now, but that seems to be painful for the upcoming refactoring of the
> attribute subsystem, as we have no good place to put the initialization
> of the attr global lock.
>
> The trick is to get a named mutex as CreateMutex[1] will return the
> existing named mutex if it exists in a thread safe way, or return
> a newly created mutex with that name.
>
> Inside the critical section of the single named mutex, we need to double
> check if the mutex was already initialized because the outer check is
> not sufficient.
> (e.g. 2 threads enter the first condition `(!a)` at the same time, but
> only one of them will acquire the named mutex first and proceeds to
> initialize the given mutex a. The second thread shall not re-initialize
> the given mutex `a`, which is why we have the inner condition on `(!a)`.
>
> Due to the use of memory barriers inside the critical section the mutex
> `a` gets updated to other threads, such that any further invocation
> will skip the initialization check code altogether on the first condition.
>
> [1] 
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms682411(v=vs.85).aspx
>
> Signed-off-by: Stefan Beller 
> ---
>
>  Flying blind here, i.e. not compiled, not tested. For a system I do not
>  have deep knowledge of. The only help was the online documentation.

This is of course a Double Check Locking Pattern, that Johannes warned about
a couple of days ago. However according to
https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
we can make it work by adding enough memory barriers, (See section
"Making it work with explicit memory barriers").


[PATCH] compat: Allow static initializer for pthreads on Windows

2016-10-26 Thread Stefan Beller
In Windows it is not possible to have a static initialized mutex as of
now, but that seems to be painful for the upcoming refactoring of the
attribute subsystem, as we have no good place to put the initialization
of the attr global lock.

The trick is to get a named mutex as CreateMutex[1] will return the
existing named mutex if it exists in a thread safe way, or return
a newly created mutex with that name.

Inside the critical section of the single named mutex, we need to double
check if the mutex was already initialized because the outer check is
not sufficient.
(e.g. 2 threads enter the first condition `(!a)` at the same time, but
only one of them will acquire the named mutex first and proceeds to
initialize the given mutex a. The second thread shall not re-initialize
the given mutex `a`, which is why we have the inner condition on `(!a)`.

Due to the use of memory barriers inside the critical section the mutex
`a` gets updated to other threads, such that any further invocation
will skip the initialization check code altogether on the first condition.

[1] 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms682411(v=vs.85).aspx

Signed-off-by: Stefan Beller 
---

 Flying blind here, i.e. not compiled, not tested. For a system I do not
 have deep knowledge of. The only help was the online documentation.

 compat/win32/pthread.h | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 1c16408..a900513 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -21,9 +21,25 @@
 static inline int return_0(int i) {
return 0;
 }
+
+#define PTHREAD_MUTEX_INITIALIZER NULL
 #define pthread_mutex_init(a,b) return_0((InitializeCriticalSection((a)), 0))
 #define pthread_mutex_destroy(a) DeleteCriticalSection((a))
-#define pthread_mutex_lock EnterCriticalSection
+#define pthread_mutex_lock(a) \
+{ \
+   if (!a) { \
+   HANDLE p = CreateMutex(NULL, FALSE, 
"Git-Global-Windows-Mutex"); \
+   EnterCriticalSection(p); \
+   MemoryBarrier(); \
+   if (!a)
+   pthread_mutex_init(a); \
+   MemoryBarrier(); \
+   ReleaseMutex(p); \
+   } \
+   EnterCriticalSection(a); \
+}
+
+
 #define pthread_mutex_unlock LeaveCriticalSection
 
 typedef int pthread_mutexattr_t;
-- 
2.10.1.508.g6572022



[PATCH] attr: expose error reporting function for invalid attribute names

2016-10-26 Thread Stefan Beller
From: Junio C Hamano 

Export invalid_attr_name_message() function that returns the
message to be given when a given  pair
is not a good name for an attribute.

We could later update the message to exactly spell out what the
rules for a good attribute name are, etc.

We do not need to export the validity check 'attr_name_valid()' itself
as we will learn about the validity indirectly in a later patch
via calling 'git_attr_counted()'.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---

 Ramsay,
 I intend to replace the previous
 
 [PATCH 17/36] attr: expose validity check for attribute names
 
 by this one in a reroll.
 
 Thanks,
 Stefan

 attr.c | 39 +--
 attr.h |  2 ++
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/attr.c b/attr.c
index 90dbacd..ec878c3 100644
--- a/attr.c
+++ b/attr.c
@@ -59,23 +59,38 @@ static unsigned hash_name(const char *name, int namelen)
return val;
 }
 
-static int invalid_attr_name(const char *name, int namelen)
+static int attr_name_valid(const char *name, size_t namelen)
 {
/*
 * Attribute name cannot begin with '-' and must consist of
 * characters from [-A-Za-z0-9_.].
 */
if (namelen <= 0 || *name == '-')
-   return -1;
+   return 0;
while (namelen--) {
char ch = *name++;
if (! (ch == '-' || ch == '.' || ch == '_' ||
   ('0' <= ch && ch <= '9') ||
   ('a' <= ch && ch <= 'z') ||
   ('A' <= ch && ch <= 'Z')) )
-   return -1;
+   return 0;
}
-   return 0;
+   return 1;
+}
+
+void invalid_attr_name_message(struct strbuf *err, const char *name, int len)
+{
+   strbuf_addf(err, _("%.*s is not a valid attribute name"),
+   len, name);
+}
+
+static void report_invalid_attr(const char *name, size_t len,
+   const char *src, int lineno)
+{
+   struct strbuf err = STRBUF_INIT;
+   invalid_attr_name_message(, name, len);
+   fprintf(stderr, "%s: %s:%d\n", err.buf, src, lineno);
+   strbuf_release();
 }
 
 struct git_attr *git_attr_counted(const char *name, size_t len)
@@ -90,7 +105,7 @@ struct git_attr *git_attr_counted(const char *name, size_t 
len)
return a;
}
 
-   if (invalid_attr_name(name, len))
+   if (!attr_name_valid(name, len))
return NULL;
 
FLEX_ALLOC_MEM(a, name, name, len);
@@ -176,17 +191,15 @@ static const char *parse_attr(const char *src, int 
lineno, const char *cp,
cp++;
len--;
}
-   if (invalid_attr_name(cp, len)) {
-   fprintf(stderr,
-   "%.*s is not a valid attribute name: %s:%d\n",
-   len, cp, src, lineno);
+   if (!attr_name_valid(cp, len)) {
+   report_invalid_attr(cp, len, src, lineno);
return NULL;
}
} else {
/*
 * As this function is always called twice, once with
 * e == NULL in the first pass and then e != NULL in
-* the second pass, no need for invalid_attr_name()
+* the second pass, no need for attr_name_valid()
 * check here.
 */
if (*cp == '-' || *cp == '!') {
@@ -229,10 +242,8 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
name += strlen(ATTRIBUTE_MACRO_PREFIX);
name += strspn(name, blank);
namelen = strcspn(name, blank);
-   if (invalid_attr_name(name, namelen)) {
-   fprintf(stderr,
-   "%.*s is not a valid attribute name: %s:%d\n",
-   namelen, name, src, lineno);
+   if (!attr_name_valid(name, namelen)) {
+   report_invalid_attr(name, namelen, src, lineno);
goto fail_return;
}
}
diff --git a/attr.h b/attr.h
index bcedf92..d39e327 100644
--- a/attr.h
+++ b/attr.h
@@ -13,6 +13,8 @@ extern struct git_attr *git_attr(const char *);
 /* The same, but with counted string */
 extern struct git_attr *git_attr_counted(const char *, size_t);
 
+extern void invalid_attr_name_message(struct strbuf *, const char *, int);
+
 /* Internal use */
 extern const char git_attr__true[];
 extern const char git_attr__false[];
-- 
2.10.1.508.g6572022



Re: [PATCH 0/2] git-svn: implement "git worktree" awareness

2016-10-26 Thread Junio C Hamano
Eric Wong  writes:

> Eric Wong  wrote:
>> +Cc Jakub since gitweb could probably take advantage of get_record
>> from the first patch, too.  I'm not completely sure about the API
>> for this, though.
>
> Jakub: ping?
>
> +Cc: Junio, too.  I'm hoping to have this in 2.11.

I somehow was hoping that I can pull this as part of git-svn updates
for the upcoming release without having to even think about it (I
did read the patch when they were posted and did not find anything
wrong with them, fwiw).

>> The following changes since commit 3cdd5d19178a54d2e51b5098d43b57571241d0ab:
>> 
>>   Sync with maint (2016-10-11 14:55:48 -0700)
>> 
>> are available in the git repository at:
>> 
>>   git://bogomips.org/git-svn.git svn-wt
>> 
>> for you to fetch changes up to 112423eb905cf28c9445781a7647ba590d597ab3:
>> 
>>   git-svn: "git worktree" awareness (2016-10-14 01:36:12 +)

Thanks.


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote:
>
>> > I actually wonder if it is worth carrying around the O_NOATIME hack at
>> > all.
>> 
>> Yes, I share the thought.  We no longer have too many loose objects
>> to matter.
>> 
>> I do not mind flipping the order, but I'd prefer to cook the result
>> even longer.  I am tempted to suggest we take two step route:
>> 
>>  - ship 2.11 with the "atime has been there and we won't regress it"
>>shape, while cooking the "cloexec is semantically more
>>important" version in 'next' during the feature freeze
>> 
>>  - immediately after 2.11 merge it to 'master' for 2.12 to make sure
>>there is no fallout.
>
> That sounds reasonable, though I'd consider jumping straight to "NOATIME
> is not worth it; drop it" as the patch for post-2.11.

That endgame is fine by me too.  Thanks for a sanity-check.


Re: git-archive and submodules

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 1:37 PM, Anatoly Borodin
 wrote:
> are there plans to add submodules support to git-archive?

plans by whom?

Git is a project with contributors from all over the place. (different
time zones,
people motivated by different means, i.e. we have the hobbiest that
scratches their
itch, we have paid people working on Git because their employer wants
them to work on Git,
there are other people (who like to) use Git in their work environment
and hack on it
in their spare time to make it awesome.)

AFAICT there are currently not a lot of people actively working on
submodule features,
though there is some history, e.g. Jens Lehmann maintains a wiki
specialised on submodules
https://github.com/jlehmann/git-submod-enhancements/wiki
and archive is mentioned there as one of the many "Issues still to be tackled".

Maybe you want to give it a try as you need it? I'd be happy to review any
submodule related code.

How to get started:

 * git clone https://github.com/git/git
 * Read (at least skim) Documentation/SubmittingPatches)
 * Look at builtin/archive.c as a starting point (cmd_archive is called
when you call "git archive ...")
 * That leads to archive.c:write_archive, which calls parse_archive_args
  There we'd want to add an option there for recursing into submodules.
 * See write_archive_entry (still in archive.c) that mentions S_ISGITLINK
   Somewhere there you need to add code. :)

Thanks,
Stefan


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 1:40 PM, Johannes Sixt  wrote:
> Am 26.10.2016 um 22:26 schrieb Jeff King:
>>
>> On Wed, Oct 26, 2016 at 10:25:38PM +0200, Johannes Sixt wrote:
>>
>>> Am 26.10.2016 um 21:51 schrieb Stefan Beller:

 it is
 very convenient to not have to explicitly initialize mutexes?
>>>
>>>
>>> Not to initialize a mutex is still wrong for pthreads.
>>
>>
>> I think Stefan was being loose with his wording. There would still be an
>> initializer, but it would be a constant (and in the case of pthread
>> emulation on Windows, would just be NULL).
>
>
> And I was loose, too: Not to initialize a mutex with at least
> PTHREAD_MUTEX_INITILIZER (if not pthread_mutex_init) is still wrong.
>

My words were wrong, I meant statically initialized instead of the need to
call a function to initialize a mutex. (For the attribute subsystem, where would
that function go? We use attrs all over the place. My current thinking would
be in git.c to initialize the Big Single Attr Lock. I feel like that
is not very well
maintainable though).

Sorry for the confusion,
Stefan


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 1:26 PM, Jeff King  wrote:
> On Wed, Oct 26, 2016 at 10:25:38PM +0200, Johannes Sixt wrote:
>
>> Am 26.10.2016 um 21:51 schrieb Stefan Beller:
>> > it is
>> > very convenient to not have to explicitly initialize mutexes?
>>
>> Not to initialize a mutex is still wrong for pthreads.
>
> I think Stefan was being loose with his wording. There would still be an
> initializer, but it would be a constant (and in the case of pthread
> emulation on Windows, would just be NULL).

Exactly, so we would do

/* as per the man page of pthread_mutexes: */
pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER;

int somefunction()
{
pthread_mutex_lock(); /* threadsafely initialised on first use */
...
pthread_unlock();
}

and for the Windows compat we'd do


#define PTHREAD_MUTEX_INITIALIZER NULL
#define pthread_mutex_lock emulate_pthread_mutex_lock

int emulate_pthread_mutex_lock(volatile MUTEX_TYPE *mx)
{
if (*mx == NULL) /* static initializer? */
{ /* this stackoverflow magic to initialize threadsafely if not init'd */}

EnterCriticalSection(mx) /* as it currently is in compat/win32/pthread.h */
return 0;
}






>
> -Peff


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Johannes Sixt

Am 26.10.2016 um 22:26 schrieb Jeff King:

On Wed, Oct 26, 2016 at 10:25:38PM +0200, Johannes Sixt wrote:


Am 26.10.2016 um 21:51 schrieb Stefan Beller:

it is
very convenient to not have to explicitly initialize mutexes?


Not to initialize a mutex is still wrong for pthreads.


I think Stefan was being loose with his wording. There would still be an
initializer, but it would be a constant (and in the case of pthread
emulation on Windows, would just be NULL).


And I was loose, too: Not to initialize a mutex with at least 
PTHREAD_MUTEX_INITILIZER (if not pthread_mutex_init) is still wrong.


-- Hannes



git-archive and submodules

2016-10-26 Thread Anatoly Borodin
Hi All,


are there plans to add submodules support to git-archive?


-- 
Mit freundlichen Grüßen,
Anatoly Borodin



Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 10:25:38PM +0200, Johannes Sixt wrote:

> Am 26.10.2016 um 21:51 schrieb Stefan Beller:
> > it is
> > very convenient to not have to explicitly initialize mutexes?
> 
> Not to initialize a mutex is still wrong for pthreads.

I think Stefan was being loose with his wording. There would still be an
initializer, but it would be a constant (and in the case of pthread
emulation on Windows, would just be NULL).

-Peff


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Johannes Sixt

Am 26.10.2016 um 21:51 schrieb Stefan Beller:

it is
very convenient to not have to explicitly initialize mutexes?


Not to initialize a mutex is still wrong for pthreads.

-- Hannes



Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 12:51:02PM -0700, Stefan Beller wrote:

> > I seem to recall this does not work on Windows, where the pthread
> > functions are thin wrappers over CRITICAL_SECTION. Other threaded code
> > in git does an explicit setup step before entering threaded sections.
> > E.g., see start_threads() in builtin/grep.c.
> >
> 
> I wonder if we can have a similar thing as
> http://stackoverflow.com/a/9490113 in compat/win32/pthread.{h.c} as it is
> very convenient to not have to explicitly initialize mutexes?

I agree it would be much more convenient and get rid of some repetitive
boilerplate code. I'll leave it to Windows folks to decide if they are
OK with that approach or not (I do not offhand know of any reason it
would not work).

-Peff


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote:

> > I actually wonder if it is worth carrying around the O_NOATIME hack at
> > all.
> 
> Yes, I share the thought.  We no longer have too many loose objects
> to matter.
> 
> I do not mind flipping the order, but I'd prefer to cook the result
> even longer.  I am tempted to suggest we take two step route:
> 
>  - ship 2.11 with the "atime has been there and we won't regress it"
>shape, while cooking the "cloexec is semantically more
>important" version in 'next' during the feature freeze
> 
>  - immediately after 2.11 merge it to 'master' for 2.12 to make sure
>there is no fallout.

That sounds reasonable, though I'd consider jumping straight to "NOATIME
is not worth it; drop it" as the patch for post-2.11.

-Peff


Re: [PATCH 0/2] git-svn: implement "git worktree" awareness

2016-10-26 Thread Eric Wong
Eric Wong  wrote:
> +Cc Jakub since gitweb could probably take advantage of get_record
> from the first patch, too.  I'm not completely sure about the API
> for this, though.

Jakub: ping?

+Cc: Junio, too.  I'm hoping to have this in 2.11.

> The following changes since commit 3cdd5d19178a54d2e51b5098d43b57571241d0ab:
> 
>   Sync with maint (2016-10-11 14:55:48 -0700)
> 
> are available in the git repository at:
> 
>   git://bogomips.org/git-svn.git svn-wt
> 
> for you to fetch changes up to 112423eb905cf28c9445781a7647ba590d597ab3:
> 
>   git-svn: "git worktree" awareness (2016-10-14 01:36:12 +)
> 
> 
> Eric Wong (2):
>   git-svn: reduce scope of input record separator change
>   git-svn: "git worktree" awareness
> 
>  git-svn.perl  | 13 +++--
>  perl/Git.pm   | 16 +++-
>  perl/Git/SVN.pm   | 24 +++-
>  perl/Git/SVN/Editor.pm| 12 +---
>  perl/Git/SVN/Fetcher.pm   | 15 +--
>  perl/Git/SVN/Migration.pm | 37 ++---
>  6 files changed, 69 insertions(+), 48 deletions(-)


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 5:15 AM, Jeff King  wrote:
> On Wed, Oct 26, 2016 at 11:35:58AM +0200, Simon Ruderich wrote:
>
>> >  static pthread_mutex_t attr_mutex;
>> > -#define attr_lock()pthread_mutex_lock(_mutex)
>> > +static inline void attr_lock(void)
>> > +{
>> > +   static int initialized;
>> > +
>> > +   if (!initialized) {
>> > +   pthread_mutex_init(_mutex, NULL);
>> > +   initialized = 1;
>> > +   }
>> > +   pthread_mutex_lock(_mutex);
>> > +}
>>
>> This may initialize the mutex multiple times during the first
>> lock (which may happen in parallel).
>>
>> pthread provides static initializers. To quote the man page:
>>
>> Variables of type pthread_mutex_t can also be initialized
>> statically, using the constants PTHREAD_MUTEX_INITIALIZER
>> (for fast mutexes), PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP
>> (for recursive mutexes), and
>> PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP (for error checking
>> mutexes).
>
> I seem to recall this does not work on Windows, where the pthread
> functions are thin wrappers over CRITICAL_SECTION. Other threaded code
> in git does an explicit setup step before entering threaded sections.
> E.g., see start_threads() in builtin/grep.c.
>

I wonder if we can have a similar thing as
http://stackoverflow.com/a/9490113 in compat/win32/pthread.{h.c} as it is
very convenient to not have to explicitly initialize mutexes?


Re: A bug with "git svn show-externals"

2016-10-26 Thread Eric Wong
Tao Peng  wrote:
> Hi there,
> 
> I met a bug of the "git svn show-externals” command.  If a subdirectory item 
> has a svn:externals property, and the format of the property is “URL first, 
> then the local path”, running "git svn show-externals” command at the root 
> level will result in an unusable output.
> 
> Example:
> $ svn pg svn:externals svn+ssh://src.foo.com/svn/ref/English.lproj/
> svn+ssh://src.foo.com/svn/orig/trunk/Resources/English.lproj/Localizable.strings
>  Localizable.strings

+Cc Vineet who originally implemented this 9 years ago

I've never used externals much, but I guess it's common for
externals to be a full URL and not merely a relative path
to somewhere within the same SVN repo.

> $ git svn show-externals
> # /English.lproj/
> /English.lproj/svn+ssh://src.foo.com/svn/orig/trunk/Resources/English.lproj/Localizable.strings
>  Localizable.strings

> 
> This bug is preventing my script from correctly finishing the svn-to-git repo 
> migration work. Does anyone know a workaround to this bug?

Can you try the following change to ignore path prefixing for
full URLs?

diff --git a/git-svn.perl b/git-svn.perl
index 4d41d22..ced665a 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -1303,7 +1303,9 @@ sub cmd_show_externals {
my $s = $props->{'svn:externals'} or return;
$s =~ s/[\r\n]+/\n/g;
chomp $s;
-   $s =~ s#^#$path#gm;
+   if ($s !~ m#^[a-z\+]+://#i) {
+   $s =~ s#^#$path#gm;
+   }
print STDOUT "$s\n";
});
 }


Re: [PATCH] Documentation/git-diff: document git diff with 3+ commits

2016-10-26 Thread Junio C Hamano
Michael J Gruber  writes:

> That one is difficult to discover but super useful, so document it:
> Specifying 3 or more commits makes git diff switch to combined diff.
>
> Signed-off-by: Michael J Gruber 
> ---
>
> Notes:
> Note that we have the following now:
> ...
> 'git diff A..B' equivalent to 'git diff A B'
> in contrast to 'git log A..B' listing commits between M and B only
> (without the commits between M and A unless they are "in" B).

The standard answer is: 

Do not use two-dot form with 'git diff', if you find it
confusing.  Diff is about two endpoints, not about a range
between two.

The reason why we do not reject can be easily guessed by any
intelligent person when some historical background is given, I
think.

 - In the beginning A...B did not exist.  A..B was the only "range"
   notation.

 - "git log A..B" was in wide use.  Remember, "git log A...B" did
   not exist.

 - People started mistyping "git diff A..B", which looked as if the
   user typed "git diff ^A B" to the internal.

 - Git _could_ have rejected that as a bogus request to diff two
   points, ^A (what is that???) and B, but "What else could the user
   have meant with 'git diff A..B' other than 'git diff A B'?" was
   an argument to favor doing _something_ useful rather than
   erroring out.  Remember, "A...B" did not exist when this
   happened.

So there.  We may want to deprecate "diff A..B" but I personally do
not think it is worth the effort.  There is no substitute for the
current "diff A...B" to allow us to deprecate both at the same time,
which is the required first step if we want to eventually swap their
meaning in a far future version of Git which would have been better
if we had hindsight.  But remember, "A...B" did not exist back then. 

> diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
> index bbab35fcaf..2047318a27 100644
> --- a/Documentation/git-diff.txt
> +++ b/Documentation/git-diff.txt
> @@ -12,6 +12,7 @@ SYNOPSIS
>  'git diff' [options] [] [--] [...]
>  'git diff' [options] --cached [] [--] [...]
>  'git diff' [options]   [--] [...]
> +'git diff' [options][...]

Made me wonder "is [...] 0-or-more As or 1-or-more As?".
Don't we allow pathspecs in this case?

>  'git diff' [options]  
>  'git diff' [options] [--no-index] [--]  
>  
> @@ -75,9 +76,16 @@ two blob objects, or changes between two files on disk.
>   "git diff $(git-merge-base A B) B".  You can omit any one
>   of , which has the same effect as using HEAD instead.
>  
> +'git diff' [options][...]::
> +
> + This is to view a combined diff between the first 
> + and the remaining ones, just like viewing a combined diff
> + for a merge commit (see below) where the first 
> + is the merge commit and the remaining ones are the parents.
> +
>  Just in case if you are doing something exotic, it should be
>  noted that all of the  in the above description, except
> -in the last two forms that use ".." notations, can be any
> +in the two forms that use ".." notations, can be any
>  .
>  
>  For a more complete list of ways to spell , see


Re: [PATCH] hex: use unsigned index for ring buffer

2016-10-26 Thread Junio C Hamano
René Scharfe  writes:

> Actually I didn't sign-off on purpose originally.  But OK, let's keep
> the version below.  I just feel strangely sad seeing that concise magic
> go.  Nevermind.

I actually share the sadness, too, but let's be stupid and obvious
here.

Thanks.

>
> Signed-off-by: Rene Scharfe 
>
>> -- >8 --
>> From: René Scharfe 
>> Date: Sun, 23 Oct 2016 19:57:30 +0200
>> Subject: [PATCH] hex: make wraparound of the index into ring-buffer explicit
>>
>> Overflow is defined for unsigned integers, but not for signed ones.
>>
>> We could make the ring-buffer index in sha1_to_hex() and
>> get_pathname() unsigned to be on the safe side to resolve this, but
>> let's make it explicit that we are wrapping around at whatever the
>> number of elements the ring-buffer has.  The compiler is smart enough
>> to turn modulus into bitmask for these codepaths that use
>> ring-buffers of a size that is a power of 2.
>>
>> Signed-off-by: René Scharfe 
>> Signed-off-by: Junio C Hamano 
>> ---
>>  hex.c  | 3 ++-
>>  path.c | 3 ++-
>>  2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/hex.c b/hex.c
>> index ab2610e498..845b01a874 100644
>> --- a/hex.c
>> +++ b/hex.c
>> @@ -78,7 +78,8 @@ char *sha1_to_hex(const unsigned char *sha1)
>>  {
>>  static int bufno;
>>  static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
>> -return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1);
>> +bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
>> +return sha1_to_hex_r(hexbuffer[bufno], sha1);
>>  }
>>
>>  char *oid_to_hex(const struct object_id *oid)
>> diff --git a/path.c b/path.c
>> index fe3c4d96c6..9bfaeda207 100644
>> --- a/path.c
>> +++ b/path.c
>> @@ -24,7 +24,8 @@ static struct strbuf *get_pathname(void)
>>  STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
>>  };
>>  static int index;
>> -struct strbuf *sb = _array[3 & ++index];
>> +struct strbuf *sb = _array[index];
>> +index = (index + 1) % ARRAY_SIZE(pathname_array);
>>  strbuf_reset(sb);
>>  return sb;
>>  }
>>


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Junio C Hamano
Jeff King  writes:

> Of the two flags, I would say CLOEXEC is the more important one to
> respect because it may actually impact correctness (e.g., leaking
> descriptors to sub-processes). Whereas O_NOATIME is purely a performance
> optimization.

I tend to agree.

> I actually wonder if it is worth carrying around the O_NOATIME hack at
> all.

Yes, I share the thought.  We no longer have too many loose objects
to matter.

I do not mind flipping the order, but I'd prefer to cook the result
even longer.  I am tempted to suggest we take two step route:

 - ship 2.11 with the "atime has been there and we won't regress it"
   shape, while cooking the "cloexec is semantically more
   important" version in 'next' during the feature freeze

 - immediately after 2.11 merge it to 'master' for 2.12 to make sure
   there is no fallout.



Re: [PATCH] hex: use unsigned index for ring buffer

2016-10-26 Thread René Scharfe

Am 25.10.2016 um 20:28 schrieb Junio C Hamano:

Jeff King  writes:


On Mon, Oct 24, 2016 at 04:53:50PM -0700, Junio C Hamano wrote:


So how about this?  It gets rid of magic number 3 and works for array
size that's not a power of two.  And as a nice side effect it can't
trigger a signed overflow anymore.


Looks good to me.  Peff?


Any of the variants discussed in this thread is fine by me.


OK, here is what I'll queue then.
I assumed that René wants to sign it off ;-).


Actually I didn't sign-off on purpose originally.  But OK, let's keep
the version below.  I just feel strangely sad seeing that concise magic
go.  Nevermind.

Signed-off-by: Rene Scharfe 


-- >8 --
From: René Scharfe 
Date: Sun, 23 Oct 2016 19:57:30 +0200
Subject: [PATCH] hex: make wraparound of the index into ring-buffer explicit

Overflow is defined for unsigned integers, but not for signed ones.

We could make the ring-buffer index in sha1_to_hex() and
get_pathname() unsigned to be on the safe side to resolve this, but
let's make it explicit that we are wrapping around at whatever the
number of elements the ring-buffer has.  The compiler is smart enough
to turn modulus into bitmask for these codepaths that use
ring-buffers of a size that is a power of 2.

Signed-off-by: René Scharfe 
Signed-off-by: Junio C Hamano 
---
 hex.c  | 3 ++-
 path.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hex.c b/hex.c
index ab2610e498..845b01a874 100644
--- a/hex.c
+++ b/hex.c
@@ -78,7 +78,8 @@ char *sha1_to_hex(const unsigned char *sha1)
 {
static int bufno;
static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
-   return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1);
+   bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
+   return sha1_to_hex_r(hexbuffer[bufno], sha1);
 }

 char *oid_to_hex(const struct object_id *oid)
diff --git a/path.c b/path.c
index fe3c4d96c6..9bfaeda207 100644
--- a/path.c
+++ b/path.c
@@ -24,7 +24,8 @@ static struct strbuf *get_pathname(void)
STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
};
static int index;
-   struct strbuf *sb = _array[3 & ++index];
+   struct strbuf *sb = _array[index];
+   index = (index + 1) % ARRAY_SIZE(pathname_array);
strbuf_reset(sb);
return sb;
 }



Re: [PATCH] reset: --unmerge

2016-10-26 Thread Junio C Hamano
Duy Nguyen  writes:

> Interestingly the thread/bug that resulted in that commit started with
> "report this bug to git" [2]. Something about git-stash. I quote the
> original mail here in case anyone wants to look into it (not sure if
> it's actually reported here before, I don't pay much attention to
> git-stash mails)
>
> -- 8< --
> Bad news, everyone!
>
> When a stash contains changes for several files, and "stash pop"
> encounters conflicts only in some of them, the rest of the files are
> stages automatically.

It indeed is curious.

That is the designed behaviour for _ANY_ mergy operation, and not
limited to "stash pop".

A clean application is added to the index so that you can find out
about them from "diff --cached", while conflicted ones keep their
unmerged stages so that the conflict can be resolved in the working
tree files.  There is no bad news here.

Once you resolve the conflict, you would add the final contents to
the working tree, but as anybody who knows how "git diff" after
resolving conflicts in the working tree files is useful would know,
"saving the editor buffer after removing conflict markers" is not a
valid signal that the user is confident that the contents is final.

> At least, that happens with Git 2.1.0 on my machine, and some
> commenters here: http://stackoverflow.com/a/1237337/615245
>
> So then when we unstage the files which had conflicts after resolving
> those, the result is mixed. Which doesn't look right.

Whoever wrote this does not understand how mergy operations in Git
works, I guess.

> What shall we do? Unstage the automatically-staged files? Revert the
> changes from this bug? It seems Git really wants the changes staged
> after the conflict resolution.

The first order of business is to learn how mergy operations in Git
is designed to work, and if they are in the business of building a
tool around Git to make the life of users better, avoid going against
the designed workflow.

If this "Bad news, everyone!" is why vc-git-resolve-conflicts was
added and defaults to true, I can feel safe in toggling it off
forever in my ~/.emacs, knowing that it is a totally broken option
that came from a desire to fix a problem that does not exist.



Re: [PATCH] rebase: add --forget to cleanup rebase, leave HEAD untouched

2016-10-26 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> There are occasions when you decide to abort an in-progress rebase and
> move on to do something else but you forget to do "git rebase --abort"
> first. Or the rebase has been in progress for so long you forgot about
> it. By the time you realize that (e.g. by starting another rebase)
> it's already too late to retrace your steps. The solution is normally
>
> rm -r .git/
>
> and continue with your life. But there could be two different
> directories for  (and it obviously requires some
> knowledge of how rebase works), and the ".git" part could be much
> longer if you are not at top-dir, or in a linked worktree. And
> "rm -r" is very dangerous to do in .git, a mistake in there could
> destroy object database or other important data.
>
> Provide "git rebase --forget" for this exact use case.

Two and a half comments.

 - The title says "leave HEAD untouched".  Are my working tree files
   and my index also safe from this operation, or is HEAD the only
   thing that is protected?

 - I think I saw a variant of this gotcha for an unconcluded
   cherry-pick that was left behind, which the bash-prompt script
   did not notice but the next "git cherry-pick" did by complaining
   "you are in the middle" or something like that.  Perhaps we would
   want to have a similarly sounding option to help that case, too,
   not in this patch but as another patch on the same theme?

 - Would it have helped if bash-prompt were in use?  I am not saying
   that this patch becomes unnecessary if you use it; I am trying to
   see if it helps its users by reminding them what state they are
   in.



Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 09:23:21AM -0700, Junio C Hamano wrote:

> >> +  /* Might the failure be due to O_NOATIME? */
> >> +  if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
> >> +  sha1_file_open_flag &= ~O_NOATIME;
> >> +  continue;
> >> +  }
> >
> > We drop O_NOATIME, and end up with an empty flag field.
> >
> > But we will never have tried just O_CLOEXEC, which might have worked.
> 
> Yes, doing so would smudge atime, so one question is which one
> between noatime or cloexec is more important to be done at open(2)
> time.

Yes, but the missing case is one where we know that O_NOATIME does not
work (but O_CLOEXEC does), so we know we have to smudge the atime.

Of the two flags, I would say CLOEXEC is the more important one to
respect because it may actually impact correctness (e.g., leaking
descriptors to sub-processes). Whereas O_NOATIME is purely a performance
optimization.

I actually wonder if it is worth carrying around the O_NOATIME hack at
all.  Linus added it on 2005-04-23 via 144bde78e9; the aim was to reduce
the cost of opening loose object files. Some things have changed since
then:

  1. In June 2005, git learned about packfiles, which means we would do
 a lot fewer atime updates (rather than one per object access, we'd
 generally get one per packfile).

  2. In late 2006, Linux learned about "relatime", which is generally
 the default on modern installs. So performance around atime updates
 is a non-issue there these days.

 All the world isn't Linux, of course, but I can't help that feel
 that atime performance hackery is something that belongs at the
 system level, not in individual applications.

So I don't have hard numbers, but I'd be surprised if O_NOATIME is
really buying us anything these days.

-Peff


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Junio C Hamano
Jeff King  writes:

>> +/* Try again w/o O_CLOEXEC: the kernel might not support it */
>> +if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) {
>> +sha1_file_open_flag &= ~O_CLOEXEC;
>>  continue;
>>  }
>
> So if we start with O_CLOEXEC|O_NOATIME, we drop CLOEXEC here and try
> again with just O_NOATIME. And then if _that_ fails...
>
>> +/* Might the failure be due to O_NOATIME? */
>> +if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
>> +sha1_file_open_flag &= ~O_NOATIME;
>> +continue;
>> +}
>
> We drop O_NOATIME, and end up with an empty flag field.
>
> But we will never have tried just O_CLOEXEC, which might have worked.

Yes, doing so would smudge atime, so one question is which one
between noatime or cloexec is more important to be done at open(2)
time.

It may be possible to open(2) only with cloexec and then fcntl(2)
FD_SET noatime immediately after, but going that way would explode
the combination even more, as it may not be possible to set these
two flags the other way around.

> I'm not sure it's worth worrying about or not; I don't know which
> systems are actually lacking either of the flags, or if they tend to
> have both.



Re: [PATCH v1 00/19] Add configuration options for split-index

2016-10-26 Thread Junio C Hamano
Duy Nguyen  writes:

> On Wed, Oct 26, 2016 at 12:21 AM, Junio C Hamano  wrote:
>
> Even if we ignore user index files (by forcing them all to be stored
> in one piece), there is a problem with the special temporary file
> index.lock, which must use split-index because it will become the new
> index. Handling race conditions could be tricky with ref counting.
> Timestamps help in this regard.

I actually think using the split-index only for the $GIT_DIR/index,
the primary one, and using the full index for others is a bad idea,
as we use temporary index ourselves when making partial commits,
which happens quite often.

So time-based GC it is.

I actually do not think index.lock is a problem, but is a solution,
if we only limit the split-index to the primary one (we can have at
most one, so we can GC the stale shared ones while holding the
lock).

But that is no longer important.




Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

2016-10-26 Thread Junio C Hamano
Duy Nguyen  writes:

> I don't object the alias.. approach though. It's
> definitely a cleaner one in my opinion. It just needs people who can
> spend time to follow up until the end. But if someone decides to do
> that now, I'll drop the "(properties)!command" and try to support
> him/her.

I don't object to either approach, but what I would love to see
people avoid is to end up with both.

Thanks.


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Duy Nguyen
On Tue, Oct 25, 2016 at 2:18 AM, Stefan Beller  wrote:
> On Mon, Oct 24, 2016 at 11:55 AM, Junio C Hamano  wrote:
>
>>
>> Make that a double-asterisk.  The same problem appears in an updated
>> example in technical/api-gitattributes.txt doc, but the example in
>> the commit log message (below) is correct.
>
> The implementation is actually using a double pointer, see below,
> I forgot commit message and documentation
>
>>> GIT_ATTR_RESULT_INIT_FOR(myresult, 1);
>>
>> Are you sure about this?  We've called attr_check_initl() already so
>> if this is declaring myresult, it would be decl-after-stmt.
>
> I forgot to update the commit message and Documentation.
> GIT_ATTR_RESULT_INIT_FOR is gone

I was asking whether this function/macro was not thread-safe and found
out it didn't exist as well, and it's bed time so I'm stopping. Will
continue my skimming on the next re-roll :)
-- 
Duy


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Duy Nguyen
On Sun, Oct 23, 2016 at 6:32 AM, Stefan Beller  wrote:
> This revamps the API of the attr subsystem to be thread safe.
> Before we had the question and its results in one struct type.
> The typical usage of the API was
>
> static struct git_attr_check *check;
>
> if (!check)
> check = git_attr_check_initl("text", NULL);


Two cents. I read the .txt first and my first thought was "is _initl a
typo, shouldn't it be jsut _init"? I know we have this 'l' variant at
least in argv-array, but there we have many ways of adding arguments.
And here it's just "initl", not "init" nor other "initX" variants,
which looks odd. I wonder if the name git_attr_check_init() would do
the job fine, since we don't have different init variants and the
naming convention is not strong enough to tell me "it's multiple
arguments ended with a NULL one" right away. If you're worried about
people forgetting NULL at the end, how about passing an array of
strings, terminated by NULL, instead?

Just thinking out loud. Maybe _initl _is_ a better name.
-- 
Duy


Re: [PATCH 32/36] pathspec: allow querying for attributes

2016-10-26 Thread Duy Nguyen
(sorry if this should have been answered if I went through the series
patch by patch, I wanted to do a proper review but finally have to
admit to myself I won't, so I just skim through a single giant diff
instead)

On Sun, Oct 23, 2016 at 6:32 AM, Stefan Beller  wrote:
> +attr;;
> +After `attr:` comes a space separated list of "attribute
> +requirements", all of which must be met in order for the
> +path to be considered a match;

What about (attr=abc def,attr=ghi lkj)? Does it mean (abc && def) ||
(ghi && lkj), or abc && def && ghi && lkj? Or is it forbidden to have
multiple 'attr' attribute in the same pathspec?
-- 
Duy


Re: [PATCH/RFC] git.c: support "!!" aliases that do not move cwd

2016-10-26 Thread Duy Nguyen
On Tue, Oct 11, 2016 at 10:01 PM, Jeff King  wrote:
> On Tue, Oct 11, 2016 at 11:44:50AM +0200, Johannes Schindelin wrote:
>
>> > Yeah, that's reasonable, too. So:
>> >
>> >   [alias]
>> > d2u = "!dos2unix"
>> >
>> > acts exactly as if:
>> >
>> >   [alias "d2u"]
>> > command = dos2unix
>> > type = shell
>> >
>> > was specified at that point, which is easy to understand.
>>
>> It is easy to understand, and even easier to get wrong or out of sync. I
>> really liked the ease of *one* `git config` call to add new aliases. Not
>> sure that I like the need for more such calls just to add *one* alias (one
>> config call for "shell", one for "don't cd up", etc).
>
> Could we simply support alias.d2u indefinitely, and you could use
> whichever format you felt like (the shorter, more limited one if you
> wanted, or the more verbose but flexible one)?

Before this thread goes completely dead... Since there's a lot more
work involved with the new alias.. approach (short
term would be git completion support, longer term would be the ability
to manipulate a config group more conveniently), I'm going with the
"(properties)!command" approach. But even then a new series is not
going to pop up, like, in the next two months.

I don't object the alias.. approach though. It's
definitely a cleaner one in my opinion. It just needs people who can
spend time to follow up until the end. But if someone decides to do
that now, I'll drop the "(properties)!command" and try to support
him/her.
-- 
Duy


Re: [Question] Git histrory after greybus merge

2016-10-26 Thread Dmitry Safonov
Adding Cc: git list, Junio.

2016-10-26 15:55 GMT+03:00 Dmitry Safonov <0x7f454...@gmail.com>:
> Hi,
>
> Is there any way to specify git-log or git-rev-list which root tree to use?
> I mean, I got the following situation:
> I saw the commit a67dd266adf4 ("netfilter: xtables: prepare for
> on-demand hook register")
> by git-blame and want to see commits on top of that particular commit.
> Earlier I've used for that:
> $ git log --reverse a67dd266adf4^..HEAD
>
> But now after merging greybus it follows the greybus's tree and shows me:
> [linux]$ git log --reverse a67dd266adf4^..HEAD --oneline
> cd26f1bd6bf3 greybus: Initial commit
> c8a797a98cb6 greybus: Import most recent greybus code to new repo.
> 06823c3eb9c4 greybus: README and .gitignore updates
>
> Which quite sucks as this isn't a hash I'm referencing.
> Anyway, back to the question, is there any option to tell git which tree to 
> use?
> I'm sure this was asked before (on btrfs merge?), but I didn't find
> the answer so far.
> I'm using git v2.10.1 if anything.
>
> Thanks,
>  Dmitry


Re: [PATCH 7/7] setup_git_env: avoid blind fall-back to ".git"

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 07:26:20PM +0700, Duy Nguyen wrote:

> > I'm not sure this is really any convenience over dumping a corefile
> > and using gdb to pull out the
> > symbols after the fact.
> 
> So are we back to forcing core files? I'm ok with that! The only
> inconvenience I see is pointing out where the core file is, which
> should be where `pwd` originally is. On linux we can even peek into
> /proc/sys/kernel/core_pattern if we want to be precise. ulimit can get
> in the way, but I don't if the default out there is enable or disable
> core dumping. Once we got OS info and git version, our chances of
> cracking the core files should be reasonably high.

TBH, most of the time I expect the solution to be walking the person
through:

  git clone git://kernel.org/pub/scm/git/git.git
  cd git
  make

  gdb --args ./git whatever
  break die
  run
  bt

which would cover most cases (reproducible breakage, and not in a
sub-program). It's relatively rare that even I resort to corefiles.

-Peff


Re: [PATCH 7/7] setup_git_env: avoid blind fall-back to ".git"

2016-10-26 Thread Duy Nguyen
On Wed, Oct 26, 2016 at 7:10 PM, Jeff King  wrote:
> On Wed, Oct 26, 2016 at 05:29:21PM +0700, Duy Nguyen wrote:
>
>> > I think you could conditionally make git_path() and all of its
>> > counterparts macros, similar to the way the trace code works. It seems
>> > like a pretty maintenance-heavy solution, though. I'd prefer
>> > conditionally compiling backtrace(); that also doesn't hit 100% of
>> > cases, but at least it isn't too invasive.
>>
>> OK, a more polished patch is this. There are warnings about
>> -fomit-function-pointers in glibc man page, at least in my simple
>> tests it does not cause any issue.
>
> Yeah, I tried with -fno-omit-frame-pointer, but it didn't help. The
> glibc backtrace(3) manpage specifically says:
>
>   The symbol names may be unavailable without the use of special linker
>   options. For systems using the GNU linker, it is necessary to use the
>   -rdynamic linker option. Note that names of "static" functions are not
>   exposed, and won't be available in the backtrace.
>
> which matches the behavior I get.
>
> Gcc ships with a libbacktrace which does seem to give reliable results
> (patch below for reference). But that's still relying on gcc, and on
> having debug symbols available.

Yep. On an optimized build you can't get anywhere without debug info,
which has a giant database to describe "if your rip/pc register is
here, then you clue to find your caller is there" for basically every
instruction in your program. Dwarf3 at least is a crazy world.

> I'm not sure this is really any convenience over dumping a corefile and using 
> gdb to pull out the
> symbols after the fact.

So are we back to forcing core files? I'm ok with that! The only
inconvenience I see is pointing out where the core file is, which
should be where `pwd` originally is. On linux we can even peek into
/proc/sys/kernel/core_pattern if we want to be precise. ulimit can get
in the way, but I don't if the default out there is enable or disable
core dumping. Once we got OS info and git version, our chances of
cracking the core files should be reasonably high.
-- 
Duy


Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 11:35:58AM +0200, Simon Ruderich wrote:

> >  static pthread_mutex_t attr_mutex;
> > -#define attr_lock()pthread_mutex_lock(_mutex)
> > +static inline void attr_lock(void)
> > +{
> > +   static int initialized;
> > +
> > +   if (!initialized) {
> > +   pthread_mutex_init(_mutex, NULL);
> > +   initialized = 1;
> > +   }
> > +   pthread_mutex_lock(_mutex);
> > +}
> 
> This may initialize the mutex multiple times during the first
> lock (which may happen in parallel).
> 
> pthread provides static initializers. To quote the man page:
> 
> Variables of type pthread_mutex_t can also be initialized
> statically, using the constants PTHREAD_MUTEX_INITIALIZER
> (for fast mutexes), PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP
> (for recursive mutexes), and
> PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP (for error checking
> mutexes).

I seem to recall this does not work on Windows, where the pthread
functions are thin wrappers over CRITICAL_SECTION. Other threaded code
in git does an explicit setup step before entering threaded sections.
E.g., see start_threads() in builtin/grep.c.

-Peff


Re: [PATCH 7/7] setup_git_env: avoid blind fall-back to ".git"

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 05:29:21PM +0700, Duy Nguyen wrote:

> > I think you could conditionally make git_path() and all of its
> > counterparts macros, similar to the way the trace code works. It seems
> > like a pretty maintenance-heavy solution, though. I'd prefer
> > conditionally compiling backtrace(); that also doesn't hit 100% of
> > cases, but at least it isn't too invasive.
> 
> OK, a more polished patch is this. There are warnings about
> -fomit-function-pointers in glibc man page, at least in my simple
> tests it does not cause any issue.

Yeah, I tried with -fno-omit-frame-pointer, but it didn't help. The
glibc backtrace(3) manpage specifically says:

  The symbol names may be unavailable without the use of special linker
  options. For systems using the GNU linker, it is necessary to use the
  -rdynamic linker option. Note that names of "static" functions are not
  exposed, and won't be available in the backtrace.

which matches the behavior I get.

Gcc ships with a libbacktrace which does seem to give reliable results
(patch below for reference). But that's still relying on gcc, and on
having debug symbols available. I'm not sure this is really any
convenience over dumping a corefile and using gdb to pull out the
symbols after the fact.

---
diff --git a/config.mak.uname b/config.mak.uname
index 76f885281c..62a14f10d3 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -40,6 +40,8 @@ ifeq ($(uname_S),Linux)
NEEDS_LIBRT = YesPlease
HAVE_GETDELIM = YesPlease
SANE_TEXT_GREP=-a
+   BASIC_CFLAGS += -DHAVE_BACKTRACE
+   EXTLIBS += -lbacktrace
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
HAVE_ALLOCA_H = YesPlease
diff --git a/usage.c b/usage.c
index 17f52c1b5c..4b9762ae62 100644
--- a/usage.c
+++ b/usage.c
@@ -5,6 +5,9 @@
  */
 #include "git-compat-util.h"
 #include "cache.h"
+#ifdef HAVE_BACKTRACE
+#include 
+#endif
 
 static FILE *error_handle;
 static int tweaked_error_buffering;
@@ -24,6 +27,44 @@ void vreportf(const char *prefix, const char *err, va_list 
params)
fputc('\n', fh);
 }
 
+#ifdef HAVE_BACKTRACE
+static int full_callback(void *fh, uintptr_t pc,
+const char *filename, int lineno,
+const char *function)
+{
+   if (!function || !filename)
+   return 0;
+   fprintf(fh, "debug:  %s() at %s:%d\n", function, filename, lineno);
+   return 0;
+}
+
+static void error_callback(void *fh, const char *msg, int errnum)
+{
+   fprintf(fh, "backtrace error: %s", msg);
+   if (errnum > 0)
+   fprintf(fh, ": %s", strerror(errnum));
+   fputc('\n', fh);
+}
+
+static void maybe_backtrace(void)
+{
+   FILE *fh = error_handle ? error_handle : stderr;
+   struct backtrace_state *bt;
+
+   if (!git_env_bool("GIT_BACKTRACE_ON_DIE", 0))
+   return;
+
+   /* XXX obviously unportable use of /proc */
+   bt = backtrace_create_state("/proc/self/exe", 0, error_callback, fh);
+   if (bt) {
+   fprintf(fh, "debug: die() called at:\n");
+   backtrace_full(bt, 3, full_callback, error_callback, fh);
+   }
+}
+#else
+#define maybe_backtrace()
+#endif
+
 static NORETURN void usage_builtin(const char *err, va_list params)
 {
vreportf("usage: ", err, params);
@@ -33,6 +74,7 @@ static NORETURN void usage_builtin(const char *err, va_list 
params)
 static NORETURN void die_builtin(const char *err, va_list params)
 {
vreportf("fatal: ", err, params);
+   maybe_backtrace();
exit(128);
 }
 


Re: [PATCH 7/7] setup_git_env: avoid blind fall-back to ".git"

2016-10-26 Thread Duy Nguyen
On Tue, Oct 25, 2016 at 11:15:25AM -0400, Jeff King wrote:
> > The "once we've identified" part could be tricky though. This message
> > alone will not give us any clue where it's called since it's buried
> > deep in git_path() usually, which is buried deep elsewhere. Without
> > falling back to core dumps (with debug info), glibc's backtrace
> > (platform specifc), the best we could do is turn git_path() into a
> > macro that takes __FILE__ and __LINE__ and somehow pass the info down
> > here, but "..." in macros is C99 specific, sigh..
> > 
> > Is it too bad to turn git_path() into a macro when we know the
> > compiler is C99 ? Older compilers will have no source location info in
> > git_path(), Hopefully they are rare, which means chances of this fault
> > popping up are also reduced.
> 
> I think you could conditionally make git_path() and all of its
> counterparts macros, similar to the way the trace code works. It seems
> like a pretty maintenance-heavy solution, though. I'd prefer
> conditionally compiling backtrace(); that also doesn't hit 100% of
> cases, but at least it isn't too invasive.

OK, a more polished patch is this. There are warnings about
-fomit-function-pointers in glibc man page, at least in my simple
tests it does not cause any issue.

> But I think I still prefer just letting the corefile and the debugger do
> their job. This error shouldn't happen much, and when it does, it should
> be easily reproducible. Getting the bug reporter to give either a
> reproduction recipe, or to run "gdb git" doesn't seem like that big a
> hurdle.

There are other places where backtrace() support may come handy
too. If -rdynamic was not needed, I would push for this patch. Too bad
backtrace() is not a perfect magic wand.

-- 8< --
diff --git a/config.mak.uname b/config.mak.uname
index b232908..b38f62a 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -40,6 +40,8 @@ ifeq ($(uname_S),Linux)
NEEDS_LIBRT = YesPlease
HAVE_GETDELIM = YesPlease
SANE_TEXT_GREP=-a
+   # for backtrace() support with glibc >= 2.1
+   BASIC_LDFLAGS += -rdynamic
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
HAVE_ALLOCA_H = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index 43718da..3561ab9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -408,6 +408,7 @@ extern NORETURN void usage(const char *err);
 extern NORETURN void usagef(const char *err, ...) __attribute__((format 
(printf, 1, 2)));
 extern NORETURN void die(const char *err, ...) __attribute__((format (printf, 
1, 2)));
 extern NORETURN void die_errno(const char *err, ...) __attribute__((format 
(printf, 1, 2)));
+extern NORETURN void BUG(const char *err, ...) __attribute__((format (printf, 
1, 2)));
 extern int error(const char *err, ...) __attribute__((format (printf, 1, 2)));
 extern int error_errno(const char *err, ...) __attribute__((format (printf, 1, 
2)));
 extern void warning(const char *err, ...) __attribute__((format (printf, 1, 
2)));
@@ -709,6 +710,7 @@ extern int git_vsnprintf(char *str, size_t maxsize,
 #ifdef __GLIBC_PREREQ
 #if __GLIBC_PREREQ(2, 1)
 #define HAVE_STRCHRNUL
+#define HAVE_BACKTRACE
 #endif
 #endif
 
@@ -722,6 +724,23 @@ static inline char *gitstrchrnul(const char *s, int c)
 }
 #endif
 
+#ifdef HAVE_BACKTRACE
+#include 
+static inline void dump_backtrace(FILE *fp)
+{
+   void *buffer[32];
+   int nptrs;
+
+   nptrs = backtrace(buffer, sizeof(buffer) / sizeof(*buffer));
+   fflush(fp);
+   backtrace_symbols_fd(buffer, nptrs, fileno(fp));
+}
+#else
+static inline void dump_backtrace(FILE *fp)
+{
+}
+#endif
+
 #ifdef NO_INET_PTON
 int inet_pton(int af, const char *src, void *dst);
 #endif
diff --git a/usage.c b/usage.c
index 17f52c1..b00603c 100644
--- a/usage.c
+++ b/usage.c
@@ -204,3 +204,16 @@ void warning(const char *warn, ...)
warn_routine(warn, params);
va_end(params);
 }
+
+void NORETURN BUG(const char *fmt, ...)
+{
+   va_list params;
+
+   va_start(params, fmt);
+   vreportf("BUG: ", fmt, params);
+   va_end(params);
+
+   dump_backtrace(error_handle ? error_handle : stderr);
+
+   exit(128);
+}
-- 8< --
--
Duy


[PATCH] rebase: add --forget to cleanup rebase, leave HEAD untouched

2016-10-26 Thread Nguyễn Thái Ngọc Duy
There are occasions when you decide to abort an in-progress rebase and
move on to do something else but you forget to do "git rebase --abort"
first. Or the rebase has been in progress for so long you forgot about
it. By the time you realize that (e.g. by starting another rebase)
it's already too late to retrace your steps. The solution is normally

rm -r .git/

and continue with your life. But there could be two different
directories for  (and it obviously requires some
knowledge of how rebase works), and the ".git" part could be much
longer if you are not at top-dir, or in a linked worktree. And
"rm -r" is very dangerous to do in .git, a mistake in there could
destroy object database or other important data.

Provide "git rebase --forget" for this exact use case.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/git-rebase.txt   |  5 -
 contrib/completion/git-completion.bash |  4 ++--
 git-rebase.sh  |  6 +-
 t/t3407-rebase-abort.sh| 24 
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index de222c8..5a58fb3 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -12,7 +12,7 @@ SYNOPSIS
[ []]
 'git rebase' [-i | --interactive] [options] [--exec ] [--onto ]
--root []
-'git rebase' --continue | --skip | --abort | --edit-todo
+'git rebase' --continue | --skip | --abort | --forget | --edit-todo
 
 DESCRIPTION
 ---
@@ -252,6 +252,9 @@ leave out at most one of A and B, in which case it defaults 
to HEAD.
will be reset to where it was when the rebase operation was
started.
 
+--forget::
+   Abort the rebase operation but leave HEAD where it is.
+
 --keep-empty::
Keep the commits that do not change anything from its
parents in the result.
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 9c8f738..bb64fb4 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1733,10 +1733,10 @@ _git_rebase ()
 {
local dir="$(__gitdir)"
if [ -f "$dir"/rebase-merge/interactive ]; then
-   __gitcomp "--continue --skip --abort --edit-todo"
+   __gitcomp "--continue --skip --abort --forget --edit-todo"
return
elif [ -d "$dir"/rebase-apply ] || [ -d "$dir"/rebase-merge ]; then
-   __gitcomp "--continue --skip --abort"
+   __gitcomp "--continue --skip --abort --forget"
return
fi
__git_complete_strategy && return
diff --git a/git-rebase.sh b/git-rebase.sh
index 04f6e44..de712b7 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -43,6 +43,7 @@ continue!  continue
 abort! abort and check out the original branch
 skip!  skip current patch and continue
 edit-todo! edit the todo list during an interactive rebase
+forget!abort but keep HEAD where it is
 "
 . git-sh-setup
 set_reflog_action rebase
@@ -241,7 +242,7 @@ do
--verify)
ok_to_skip_pre_rebase=
;;
-   --continue|--skip|--abort|--edit-todo)
+   --continue|--skip|--abort|--forget|--edit-todo)
test $total_argc -eq 2 || usage
action=${1##--}
;;
@@ -399,6 +400,9 @@ abort)
finish_rebase
exit
;;
+forget)
+   exec rm -rf "$state_dir"
+   ;;
 edit-todo)
run_specific_rebase
;;
diff --git a/t/t3407-rebase-abort.sh b/t/t3407-rebase-abort.sh
index a6a6c40..6bc5e71 100755
--- a/t/t3407-rebase-abort.sh
+++ b/t/t3407-rebase-abort.sh
@@ -99,4 +99,28 @@ testrebase() {
 testrebase "" .git/rebase-apply
 testrebase " --merge" .git/rebase-merge
 
+test_expect_success 'rebase --forget' '
+   cd "$work_dir" &&
+   # Clean up the state from the previous one
+   git reset --hard pre-rebase &&
+   test_must_fail git rebase master &&
+   test_path_is_dir .git/rebase-apply &&
+   head_before=$(git rev-parse HEAD) &&
+   git rebase --forget &&
+   test $(git rev-parse HEAD) = $head_before &&
+   test ! -d .git/rebase-apply
+'
+
+test_expect_success 'rebase --merge --forget' '
+   cd "$work_dir" &&
+   # Clean up the state from the previous one
+   git reset --hard pre-rebase &&
+   test_must_fail git rebase --merge master &&
+   test_path_is_dir .git/rebase-merge &&
+   head_before=$(git rev-parse HEAD) &&
+   git rebase --forget &&
+   test $(git rev-parse HEAD) = $head_before &&
+   test ! -d .git/rebase-merge
+'
+
 test_done
-- 
2.8.2.524.g6ff3d78



Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Simon Ruderich
On Wed, Oct 26, 2016 at 10:52:23AM +0200, Johannes Schindelin wrote:
> diff --git a/attr.c b/attr.c
> index d5a6aa9..6933504 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -50,7 +50,16 @@ static struct git_attr *(git_attr_hash[HASHSIZE]);
>  #ifndef NO_PTHREADS
>
>  static pthread_mutex_t attr_mutex;
> -#define attr_lock()pthread_mutex_lock(_mutex)
> +static inline void attr_lock(void)
> +{
> + static int initialized;
> +
> + if (!initialized) {
> + pthread_mutex_init(_mutex, NULL);
> + initialized = 1;
> + }
> + pthread_mutex_lock(_mutex);
> +}

This may initialize the mutex multiple times during the first
lock (which may happen in parallel).

pthread provides static initializers. To quote the man page:

Variables of type pthread_mutex_t can also be initialized
statically, using the constants PTHREAD_MUTEX_INITIALIZER
(for fast mutexes), PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP
(for recursive mutexes), and
PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP (for error checking
mutexes).

Regards
Simon
-- 
+ Privatsphäre ist notwendig
+ Ich verwende GnuPG http://gnupg.org
+ Öffentlicher Schlüssel: 0x92FEFDB7E44C32F9


signature.asc
Description: PGP signature


Re: [PATCH] reset: --unmerge

2016-10-26 Thread Duy Nguyen
On Wed, Oct 26, 2016 at 6:28 AM, Junio C Hamano  wrote:
> Somebody with a bright idea decided that vc-git-resolve-conflicts
> variable should be on by default in Emacs 25.1 X-<,

Oh good, I have an excuse to stick to 24.5.1 for a while longer then.

>  which causes
> "save" after resolving conflicts to automatically run "git add", to
> destroy this valuable tool.  My knee-jerk reaction, of course, to
> such a default is "that's brain dead", but perhaps they did so for
> some good reason that I fail to fathom.

I was curious. The default is t since the variable's introduction [1].
Interestingly the thread/bug that resulted in that commit started with
"report this bug to git" [2]. Something about git-stash. I quote the
original mail here in case anyone wants to look into it (not sure if
it's actually reported here before, I don't pay much attention to
git-stash mails)

-- 8< --
Bad news, everyone!

When a stash contains changes for several files, and "stash pop"
encounters conflicts only in some of them, the rest of the files are
stages automatically.

At least, that happens with Git 2.1.0 on my machine, and some
commenters here: http://stackoverflow.com/a/1237337/615245

So then when we unstage the files which had conflicts after resolving
those, the result is mixed. Which doesn't look right.

What shall we do? Unstage the automatically-staged files? Revert the
changes from this bug? It seems Git really wants the changes staged
after the conflict resolution.
-- 8< --

[1] 
http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=45651154473c7d2f16230da765d034ecfde7968a
[2] https://lists.gnu.org/archive/html/bug-gnu-emacs/2015-05/msg00433.html
-- 
Duy


Re: [PATCH v1 00/19] Add configuration options for split-index

2016-10-26 Thread Duy Nguyen
On Wed, Oct 26, 2016 at 12:21 AM, Junio C Hamano  wrote:
>> Timestamps allow us to say, ok this base index file has not been read
>> by anybody for N+ hours (or better, days), it's most likely not
>> referenced by any temporary index files (including
>> $GIT_DIR/index.lock) anymore because those files, by the definition of
>> "temporary", must be gone by now
>
> and if we guessed wrong, users will have a "temporary index" that
> they meant to keep for longer term that is now broken here.  I am
> not sure if that risk is worth taking.

Even if we ignore user index files (by forcing them all to be stored
in one piece), there is a problem with the special temporary file
index.lock, which must use split-index because it will become the new
index. Handling race conditions could be tricky with ref counting.
Timestamps help in this regard.
-- 
Duy


[PATCH] Documentation/git-diff: document git diff with 3+ commits

2016-10-26 Thread Michael J Gruber
That one is difficult to discover but super useful, so document it:
Specifying 3 or more commits makes git diff switch to combined diff.

Signed-off-by: Michael J Gruber 
---

Notes:
Note that we have the following now:

'git diff A B' displays 'B minus A'
'git diff A B C' displays 'A minus B' and 'A minus C'

While I know why, that (the implicit '-R' seems unfortunate).

(NB: One has to use '-c' if A is an actual merge commiti, it seems.)

If M is a merge base for A and B, we have:

'git diff A..B' equivalent to 'git diff A B'
in contrast to 'git log A..B' listing commits between M and B only
(without the commits between M and A unless they are "in" B).

I would expect 'git diff M B' here.

'git diff A...B' is equivalent to 'git diff M B'
in contrast to 'git log A...B' listing commits between M and A (marked left)
as well as commits between M and B (marked right).

I would expect 'git diff -c -R M A B' here.

Somehow the positive and negative ends of these ranges don't correspond well
with thinking about diffs as differences between these ends.

[I'm not exact with my use of "between" regarding boundary commits.]

 Documentation/git-diff.txt | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index bbab35fcaf..2047318a27 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -12,6 +12,7 @@ SYNOPSIS
 'git diff' [options] [] [--] [...]
 'git diff' [options] --cached [] [--] [...]
 'git diff' [options]   [--] [...]
+'git diff' [options][...]
 'git diff' [options]  
 'git diff' [options] [--no-index] [--]  
 
@@ -75,9 +76,16 @@ two blob objects, or changes between two files on disk.
"git diff $(git-merge-base A B) B".  You can omit any one
of , which has the same effect as using HEAD instead.
 
+'git diff' [options][...]::
+
+   This is to view a combined diff between the first 
+   and the remaining ones, just like viewing a combined diff
+   for a merge commit (see below) where the first 
+   is the merge commit and the remaining ones are the parents.
+
 Just in case if you are doing something exotic, it should be
 noted that all of the  in the above description, except
-in the last two forms that use ".." notations, can be any
+in the two forms that use ".." notations, can be any
 .
 
 For a more complete list of ways to spell , see
-- 
2.10.1.723.g0f00470



Re: [PATCH 27/36] attr: convert to new threadsafe API

2016-10-26 Thread Johannes Schindelin
Hi Stefan,

On Sat, 22 Oct 2016, Stefan Beller wrote:

> @@ -46,6 +47,19 @@ struct git_attr {
>  static int attr_nr;
>  static struct git_attr *(git_attr_hash[HASHSIZE]);
>  
> +#ifndef NO_PTHREADS
> +
> +static pthread_mutex_t attr_mutex;
> +#define attr_lock()  pthread_mutex_lock(_mutex)
> +#define attr_unlock()pthread_mutex_unlock(_mutex)

This mutex is never initialized. That may work on the system you tested,
but it is incorrect, and it does segfault on Windows. A lot.

I need *at least* something like this to make it stop crashing all over
the test suite:

-- snipsnap --
diff --git a/attr.c b/attr.c
index d5a6aa9..6933504 100644
--- a/attr.c
+++ b/attr.c
@@ -50,7 +50,16 @@ static struct git_attr *(git_attr_hash[HASHSIZE]);
 #ifndef NO_PTHREADS
 
 static pthread_mutex_t attr_mutex;
-#define attr_lock()pthread_mutex_lock(_mutex)
+static inline void attr_lock(void)
+{
+   static int initialized;
+
+   if (!initialized) {
+   pthread_mutex_init(_mutex, NULL);
+   initialized = 1;
+   }
+   pthread_mutex_lock(_mutex);
+}
 #define attr_unlock()pthread_mutex_unlock(_mutex)
 
 #else