Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> Yeah, I can make it work without exposing struct git_attr.
>
> You completely misunderstood me.  "struct git_attr" MUST be visible
> to the users so that they can ask for the name in git_check.attr[0].
>
> What would be nice to hide if you can is the function to intern a
> string into a pointer to struct git_attr, i.e. git_attr() function.

Even though that was not the primary point of my suggestion, I
actually think it is OK to make "struct git_attr" a structure that
is opaque to the users of the API if you wanted to.  The attr_check
structure will have an array of pointers to "struct git_attr", and
the structure definition may be visible to the public attr.h header,
but the API users won't have to be able to dereference the pointer
"struct git_attr *".  git_check.attr[0] would be a pointer to an
opaque structure from API users' point of view, that can be passed
to API function git_attr_name() to read its string.

What is nice to hide is the constructor of the structure.  What it,
i.e. "struct git_attr *git_attr(const char *)", needs to do is to
(1) see if the attribute object with the same name already exists in
the table of "all known attributes in the universe", and if there
is, return that instance, (2) otherwise create a new attribute
object, register it to the table and return it.  And it needs to do
it in a way that is thread-safe.

If we have to give access to it to the API users, then we'd need to
acquire and release the Big Attr Lock per each call.  

The calls to git_attr() you need to make in your implementation will
be made from two codepaths:

 * check_initl() acquires the Big Attr Lock, creates a check struct,
   makes multiple calls to git_attr() to construct the necessary
   git_attr instances to fill the array and then releases the lock,
   so the git_attr() constructor does not have to be protected for
   concurrent access.

 * check_attr() acquires the Big Attr Lock, calls down to
   prepare_attr_stack() as necessary to parse .gitattributes files
   found in the directory hierarchy, which makes calls to git_attr()
   to record the attributes found in the file.  Then it does the
   matching to fill results[] array and releases the lock.  Again,
   git_attr() constructors are called under the lock, so there is no
   need for a separate lock.

If these are the only callpaths that reach git_attr() to construct
new attribute objects, it would mean that you can make this private
to attr subsystem and hide it from the users of the API.

Otherwise, you would need to rename the git_attr() constructor that
used internally under the Big Lock to

static struct git_attr *git_attr_locked(const char *);

that is defined inside attr.c, and then provide the external version
as a thin wrapper that calls it under the Big Lock, i.e.

struct git_attr *git_attr(const char *s)
{
struct git_attr *attr;
take_big_attr_lock();
attr = git_attr_locked(s);
release_big_attr_lock();
return attr;
}

That will have to make the big attr lock busier, and it would be
good if we can avoid it.  That is where my "can we hide git_attr()
constructor?" comes from.



Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> Yeah, I can make it work without exposing struct git_attr.

You completely misunderstood me.  "struct git_attr" MUST be visible
to the users so that they can ask for the name in git_check.attr[0].

What would be nice to hide if you can is the function to intern a
string into a pointer to struct git_attr, i.e. git_attr() function.



Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 5:49 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>  extern struct git_attr *git_attr(const char *);
>> ...
>> +extern void git_attr_check_append(struct git_attr_check *,
>> +   const struct git_attr *);
>
> Another thing.  Do we still need to expose git_attr() to the outside
> callers?  If we change git_attr_check_append() to take "const char *"
> as its second parameter, can we retire it from the public interface?
>
> It being an "intern" function, by definition it is not thread-safe.
> Its use from prepare_attr_stack() inside git_check_attr() that takes
> the Big Attribute Subsystem Lock is safe, but the callers that do
>
> struct git_attr_check *check = ...;
> struct git_attr *text_attr = git_attr("text");
>
> git_attr_check_append(check, text_attr);
>
> would risk racing to register the same entry to the "all the
> attributes in the known universe" table.
>
> If the attribute API does not have to expose git_attr(const char *)
> to the external callers at all, that would be ideal.  Otherwise, we
> would need to rename the current git_attr() that is used internally
> under the Big Lock to
>
> static struct git_attr *git_attr_locked(const char*)
>
> that is defined inside attr.c, and then provide the external version
> as a thin wrapper that calls it under the Big Lock.
>
>

Yeah, I can make it work without exposing struct git_attr.
However then the struct git_attr_check will contain more of
internals exposed. (As the header file did not know the size
of git_attr, the patch under discussion even must use a double pointer
to a git_attr inside the git_attr_check as the git_attr size is unknown)

So I'll look into removing that struct git_attr completely.


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> To explain, you can either have:
> struct git_attr_result result[2];
> or
> struct git_attr_result *result = git_attr_result_alloc(check);
> and both are running just fine in a thread. However you should not
> make that variable static. But maybe that is too much common sense
> and hence confusing.

Yup, if you spelled it out that does make sense, but the description
given in the patch was just misleading.

Thanks.


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

>  extern struct git_attr *git_attr(const char *);
> ...
> +extern void git_attr_check_append(struct git_attr_check *,
> +   const struct git_attr *);

Another thing.  Do we still need to expose git_attr() to the outside
callers?  If we change git_attr_check_append() to take "const char *"
as its second parameter, can we retire it from the public interface?

It being an "intern" function, by definition it is not thread-safe.
Its use from prepare_attr_stack() inside git_check_attr() that takes
the Big Attribute Subsystem Lock is safe, but the callers that do

struct git_attr_check *check = ...;
struct git_attr *text_attr = git_attr("text");

git_attr_check_append(check, text_attr);

would risk racing to register the same entry to the "all the
attributes in the known universe" table.  

If the attribute API does not have to expose git_attr(const char *)
to the external callers at all, that would be ideal.  Otherwise, we
would need to rename the current git_attr() that is used internally
under the Big Lock to 

static struct git_attr *git_attr_locked(const char*)

that is defined inside attr.c, and then provide the external version
as a thin wrapper that calls it under the Big Lock.




Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 5:16 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +* Allocate an array of `struct git_attr_result` either on the stack
>> +  or via `git_attr_result_alloc` on the heap when the result size
>> +  is not known at compile time. The call to initialize
>>the result is not thread safe, because different threads need their
>>own thread local result anyway.
>
> Do you want to keep the last sentence?  "The call to initialize the
> result is not thread safe..."?  Is that true?

I'll drop that sentence, as it overstates the situation.

To explain, you can either have:
struct git_attr_result result[2];
or
struct git_attr_result *result = git_attr_result_alloc(check);
and both are running just fine in a thread. However you should not
make that variable static. But maybe that is too much common sense
and hence confusing.

>
>> @@ -103,7 +105,7 @@ To see how attributes "crlf" and "ident" are set
>> for different paths.
>>  const char *path;
>>  struct git_attr_result result[2];
>>
>> -git_check_attr(path, check, result);
>> +git_check_attr(path, &check, result);
>
> What's the point of this change?  Isn't check typically a pointer
> already?

This ought to go to

git_attr_check_initl(&check, "crlf", "ident", NULL);

instead.

>


Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> +* Allocate an array of `struct git_attr_result` either on the stack
> +  or via `git_attr_result_alloc` on the heap when the result size
> +  is not known at compile time. The call to initialize
>the result is not thread safe, because different threads need their
>own thread local result anyway.

Do you want to keep the last sentence?  "The call to initialize the
result is not thread safe..."?  Is that true?

> @@ -103,7 +105,7 @@ To see how attributes "crlf" and "ident" are set
> for different paths.
>  const char *path;
>  struct git_attr_result result[2];
>
> -git_check_attr(path, check, result);
> +git_check_attr(path, &check, result);

What's the point of this change?  Isn't check typically a pointer
already?



Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
On Wed, Oct 26, 2016 at 4:14 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> @@ -53,19 +57,32 @@ value of the attribute for the path.
>>  Querying Specific Attributes
>>  
>>
>> -* Prepare `struct git_attr_check` using git_attr_check_initl()
>> +* Prepare a `struct git_attr_check` using `git_attr_check_initl()`
>>function, enumerating the names of attributes whose values you are
>>interested in, terminated with a NULL pointer.  Alternatively, an
>> -  empty `struct git_attr_check` can be prepared by calling
>> -  `git_attr_check_alloc()` function and then attributes you want to
>> -  ask about can be added to it with `git_attr_check_append()`
>> -  function.
>> -
>> -* Call `git_check_attr()` to check the attributes for the path.
>> -
>> -* Inspect `git_attr_check` structure to see how each of the
>> -  attribute in the array is defined for the path.
>> -
>> +  empty `struct git_attr_check` as allocated by git_attr_check_alloc()
>> +  can be prepared by calling `git_attr_check_alloc()` function and
>> +  then attributes you want to ask about can be added to it with
>> +  `git_attr_check_append()` function.
>
> I think that my version that was discarded forbade appending once
> you started to use the check for querying, because the check was
> meant to be used as a key for an attr-stack and the check-specific
> attr-stack was planned to keep only the attrs the check is
> interested in (and appending is to change the set of attrs the check
> is interested in, invalidating the attr-stack at that point).
>
> If you lost that restriction, that is good (I didn't check the
> implementation, though).  Otherwise we'd need to say something here.

That restriction still applies. Though I think mentioning it in the
paragraph where we describe querying makes more sense.

> initialization?

done


>
> Grammo?  "either on the stack, or dynamically in the heap"?

done

>
> Having result defined statically is not thread safe for that
> reason.  It is not clear what you mean by "The call to initialize
> the result"; having it on the stack or have one dynamically on the
> heap ought to be thread safe.

done

>> -}
>> + static struct git_attr_check *check;
>> + git_attr_check_initl(check, "crlf", "ident", NULL);
>
> I think you are still missing "&" here.

done

>> + if (check)
>> + return; /* already done */
>>   check = git_attr_check_alloc();
>
> You may want to say that this is thread-unsafe.

It is not; see the implementation:

void git_attr_check_append(struct git_attr_check *check,
   const struct git_attr *attr)
{
int i;
if (check->finalized)
die("BUG: append after git_attr_check structure is finalized");
if (!attr_check_is_dynamic(check))
die("BUG: appending to a statically initialized git_attr_check");
attr_lock();
for (i = 0; i < check->check_nr; i++)
if (check->attr[i] == attr)
break;
if (i == check->check_nr) {
ALLOC_GROW(check->attr, check->check_nr + 1, check->check_alloc);
check->attr[check->check_nr++] = attr;
}
attr_unlock();
}

>> +* Call `git_all_attrs()`.
>
> Hmph, the caller does not know what attribute it is interested in,
> and it is unclear "how" the former needs to be set up from this
> description.  Should it prepare an empty one that can be appended?
>

Good point on clarifying this one. It is just needed to have
NULL pointers around:

struct git_attr_check *check = NULL;
struct git_attr_result *result = NULL;
git_all_attrs(full_path, &local_check, &result);
// proceed as in the case above

All comments from above yield the following diff:

diff --git a/Documentation/technical/api-gitattributes.txt
b/Documentation/technical/api-gitattributes.txt
index 4d8ef93..d221736 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -67,19 +67,21 @@ Querying Specific Attributes
   Both ways with `git_attr_check_initl()` as well as the
   alloc and append route are thread safe, i.e. you can call it
   from different threads at the same time; when check determines
-  the initialzisation is still needed, the threads will use a
+  the initialization is still needed, the threads will use a
   single global mutex to perform the initialization just once, the
   others will wait on the the thread to actually perform the
   initialization.

-* Allocate an array of `struct git_attr_result` either statically on the
-  as a variable on the stack or dynamically via `git_attr_result_alloc`
-  when the result size is not known at compile time. The call to initialize
+* Allocate an array of `struct git_attr_result` either on the stack
+  or via `git_attr_result_alloc` on the heap when the result size
+  is not known at compile time. The call to initialize
   the result is not thread safe, because different threads need their
   own thread local result anyway.

 * Call `git_check_attr()

Re: [PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Junio C Hamano
Stefan Beller  writes:

> @@ -53,19 +57,32 @@ value of the attribute for the path.
>  Querying Specific Attributes
>  
>  
> -* Prepare `struct git_attr_check` using git_attr_check_initl()
> +* Prepare a `struct git_attr_check` using `git_attr_check_initl()`
>function, enumerating the names of attributes whose values you are
>interested in, terminated with a NULL pointer.  Alternatively, an
> -  empty `struct git_attr_check` can be prepared by calling
> -  `git_attr_check_alloc()` function and then attributes you want to
> -  ask about can be added to it with `git_attr_check_append()`
> -  function.
> -
> -* Call `git_check_attr()` to check the attributes for the path.
> -
> -* Inspect `git_attr_check` structure to see how each of the
> -  attribute in the array is defined for the path.
> -
> +  empty `struct git_attr_check` as allocated by git_attr_check_alloc()
> +  can be prepared by calling `git_attr_check_alloc()` function and
> +  then attributes you want to ask about can be added to it with
> +  `git_attr_check_append()` function.

I think that my version that was discarded forbade appending once
you started to use the check for querying, because the check was
meant to be used as a key for an attr-stack and the check-specific
attr-stack was planned to keep only the attrs the check is
interested in (and appending is to change the set of attrs the check
is interested in, invalidating the attr-stack at that point).  

If you lost that restriction, that is good (I didn't check the
implementation, though).  Otherwise we'd need to say something here.

> +  Both ways with `git_attr_check_initl()` as well as the
> +  alloc and append route are thread safe, i.e. you can call it
> +  from different threads at the same time; when check determines
> +  the initialzisation is still needed, the threads will use a

initialization?

> +  single global mutex to perform the initialization just once, the
> +  others will wait on the the thread to actually perform the
> +  initialization.
> +
> +* Allocate an array of `struct git_attr_result` either statically on the
> +  as a variable on the stack or dynamically via `git_attr_result_alloc`

Grammo?  "either on the stack, or dynamically in the heap"?

> +  when the result size is not known at compile time. The call to initialize
> +  the result is not thread safe, because different threads need their
> +  own thread local result anyway.

Having result defined statically is not thread safe for that
reason.  It is not clear what you mean by "The call to initialize
the result"; having it on the stack or have one dynamically on the
heap ought to be thread safe.

> +* Call `git_check_attr()` to check the attributes for the path,
> +  the given `git_attr_result` will be filled with the result.
> +
> +* Inspect each `git_attr_result` structure to see how
> +  each of the attribute in the array is defined for the path.
>  
>  Example
>  ---
> @@ -76,28 +93,23 @@ To see how attributes "crlf" and "ident" are set for 
> different paths.
>we are checking two attributes):
>  
>  
> -static struct git_attr_check *check;
> -static void setup_check(void)
> -{
> - if (check)
> - return; /* already done */
> - check = git_attr_check_initl("crlf", "ident", NULL);
> -}
> + static struct git_attr_check *check;
> + git_attr_check_initl(check, "crlf", "ident", NULL);

I think you are still missing "&" here.

>  
>  
>  . Call `git_check_attr()` with the prepared `struct git_attr_check`:
>  
>  
>   const char *path;
> + struct git_attr_result result[2];
>  
> - setup_check();
> - git_check_attr(path, check);
> + git_check_attr(path, check, result);
>  
>  
> -. Act on `.value` member of the result, left in `check->check[]`:
> +. Act on `result.value[]`:
>  
>  
> - const char *value = check->check[0].value;
> + const char *value = result.value[0];
>  
>   if (ATTR_TRUE(value)) {
>   The attribute is Set, by listing only the name of the
> @@ -123,12 +135,15 @@ the first step in the above would be different.
>  static struct git_attr_check *check;
>  static void setup_check(const char **argv)
>  {
> + if (check)
> + return; /* already done */
>   check = git_attr_check_alloc();

You may want to say that this is thread-unsafe.

>   while (*argv) {
>   struct git_attr *attr = git_attr(*argv);
>   git_attr_check_append(check, attr);
>   argv++;
>   }
> + struct git_attr_result *result = git_attr_result_alloc(check);
>  }
>  
>  
> @@ -138,17 +153,20 @@ Querying All Attributes
>  
>  To get the values of all attributes associated with a file:
>  
> +* Setup a local variables for the question
> +  `struct git_attr_check` as well as a pointer where the result
> +  `struct git_attr_result` will be stored.
> +* Call `git_all_attrs()`.

Hmph, the call

[PATCHv2 1/2] attr: convert to new threadsafe API

2016-10-26 Thread Stefan Beller
This revamps the API of the attr subsystem to be thread safe.
Before we had the question and its results in one struct type.
The typical usage of the API was

static struct git_attr_check *check;

if (!check)
check = git_attr_check_initl("text", NULL);

git_check_attr(path, check);
act_on(check->value[0]);

This has a couple of issues when it comes to thread safety:

* the initialization is racy in this implementation. To make it
  thread safe, we need to acquire a mutex, such that only one
  thread is executing the code in git_attr_check_initl.
  As we do not want to introduce a mutex at each call site,
  this is best done in the attr code. However to do so, we need
  to have access to the `check` variable, i.e. the API has to
  look like
git_attr_check_initl(struct git_attr_check*, ...);
  Then one of the threads calling git_attr_check_initl will
  acquire the mutex and init the `check`, while all other threads
  will wait on the mutex just to realize they're late to the
  party and they'll return with no work done.

* While the check for attributes to be questioned only need to
  be initalized once as that part will be read only after its
  initialisation, the answer may be different for each path.
  Because of that we need to decouple the check and the answer,
  such that each thread can obtain an answer for the path it
  is currently processing.

This commit changes the API and adds locking in
git_attr_check_initl that provides the thread safety for constructing
`struct git_attr_check`.

The usage of the new API will be:

/*
 * The initl call will thread-safely check whether the
 * struct git_attr_check has been initialized. We only
 * want to do the initialization work once, hence we do
 * that work inside a thread safe environment.
 */
static struct git_attr_check *check;
git_attr_check_initl(&check, "text", NULL);

/* We're just asking for one attribute "text". */
git_attr_result myresult[1];

/* Perform the check and act on it: */
git_check_attr(path, check, myresult);
act_on(myresult[0].value);

/*
 * No need to free the check as it is static, hence doesn't leak
 * memory. The result is also static, so no need to free there either.
 */

Signed-off-by: Stefan Beller 
---

In the reroll of this series, I plan to use the patch as-is below:

* Documentation/commit message matches actual implementation
* initialize the attr lock statically. It was uninitialized in the first
  version, which works for me on Linux, but not so in Windows; I split off
  the static initialized mutexes on Windows as a separate patch, see
  
https://public-inbox.org/git/cagz79kzryb-5jgif04btk1v9tgfj-tnahquk+1lb7xeecu7...@mail.gmail.com
  which would be a requirement for this patch.

 Documentation/technical/api-gitattributes.txt |  94 ++---
 archive.c |  11 +-
 attr.c| 143 ++
 attr.h|  71 -
 builtin/check-attr.c  |  35 ---
 builtin/pack-objects.c|  16 +--
 convert.c |  40 +++
 ll-merge.c|  24 +++--
 userdiff.c|  16 +--
 ws.c  |   8 +-
 10 files changed, 280 insertions(+), 178 deletions(-)

diff --git a/Documentation/technical/api-gitattributes.txt 
b/Documentation/technical/api-gitattributes.txt
index 92fc32a..4d8ef93 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -16,15 +16,19 @@ Data Structure
of no interest to the calling programs.  The name of the
attribute can be retrieved by calling `git_attr_name()`.
 
-`struct git_attr_check_elem`::
-
-   This structure represents one attribute and its value.
-
 `struct git_attr_check`::
 
-   This structure represents a collection of `git_attr_check_elem`.
+   This structure represents a collection of `struct git_attrs`.
It is passed to `git_check_attr()` function, specifying the
-   attributes to check, and receives their values.
+   attributes to check, and receives their values into a corresponding
+   `struct git_attr_result`.
+
+`struct git_attr_result`::
+
+   This structure represents one results for a check, such that an
+   array of `struct git_attr_results` corresponds to a
+   `struct git_attr_check`. The answers given in that array are in
+   the the same order as the check struct.
 
 
 Attribute Values
@@ -32,7 +36,7 @@ Attribute Values
 
 An attribute for a path can be in one of four states: Set, Unset,
 Unspecified or set to a string, and `.value` member of `struct
-git_attr_check` records it.  There are three macros to check these:
+git_attr_result` records it.  There are three macro