Re: [PATCH] attr: convert to new threadsafe API

2016-10-28 Thread Johannes Sixt

Am 28.10.2016 um 00:15 schrieb Stefan Beller:

* use attr_start on Windows to dynamically initialize the Single Big Attr Mutex


I'm sorry, I didn't find time to test the patch on Windows. I won't be 
back at my Windows box before Wednesday.


-- Hannes



Re: [PATCH] attr: convert to new threadsafe API

2016-10-28 Thread Junio C Hamano
Junio C Hamano  writes:

>> +if (check)
>> +return; /* already done */
>>  check = git_attr_check_alloc();
>>  while (*argv) {
>>  struct git_attr *attr = git_attr(*argv);
>>  git_attr_check_append(check, attr);

I thought you made git_attr() constructor unavailable, so
check_append() would just get a "const char *" instead?

>>  argv++;
>>  }
>> +struct git_attr_result *result = git_attr_result_alloc(check);
>
> This does not look like thread-safe.
>
> I could understand it if the calling convention were like this,
> though:
>
>   if (git_attr_check_alloc()) {
>   while (*argv) {
>   ... append ...
>   }
>   git_attr_check_finished_appending();
>   }
>   result = result_alloc();
>
> In this variant, git_attr_check_alloc() is responsible for ensuring
> that the "check" is allocated only once just like _initl() is, and
> at the same time, it makes simultanous callers wait until the first
> caller who appends to the singleton check instance declares that it
> finished appending.  The return value signals if you are the first
> caller (who is responsible for populating the check and for
> declaring the check is ready to use at the end of appending).
> Everybody else waits while the first caller is doing the if (...) {
> } thing, and then receives false, at which time everybody (including
> the first caller) goes on and allocating its own result and start
> making queries.

Having said that, how flexible does the "alloc then append" side of
API have to be in the envisioned set of callers?  Is it expected
that it wouldn't be too hard to arrange them so that they have an
array of "const char *"s when they need to initialize/populate a
check?  If that is the case, instead of the "alloc and have others
wait" illustrated above, it may be a lot simpler to give _initv()
variant to them and be done with it, i.e. the above sample caller
would become just:

git_attr_check_initv(, argv);
result = git_attr_result_alloc();

would that be too limiting?  The use of "alloc then append" pattern
in "git check-attr" done in your patch seems to fit the _initv()
well, and the "pathspec with attr match" that appears later in the
series also has all the attributes it needs to query available by
the time it wants to allocate and append to create an instance of
"check", I would think, so the _initv() might be sufficient as a
public API.


Re: [PATCH] attr: convert to new threadsafe API

2016-10-28 Thread Junio C Hamano
Stefan Beller  writes:

> +* Prepare a `struct git_attr_check` using `git_attr_check_initl()`
>function, enumerating the names of attributes whose values you are
>interested in, terminated with a NULL pointer.  Alternatively, an
> -  empty `struct git_attr_check` can be prepared by calling
> -  `git_attr_check_alloc()` function and then attributes you want to
> -  ask about can be added to it with `git_attr_check_append()`
> -  function.
> -
> -* Call `git_check_attr()` to check the attributes for the path.
> -
> -* Inspect `git_attr_check` structure to see how each of the
> -  attribute in the array is defined for the path.
> -
> +  empty `struct git_attr_check` as allocated by git_attr_check_alloc()

Need to drop "as allocated by git_attr_check_alloc()" here.

> +  can be prepared by calling `git_attr_check_alloc()` function and
> +  then attributes you want to ask about can be added to it with
> +  `git_attr_check_append()` function.
> +  Both ways with `git_attr_check_initl()` as well as the
> +  alloc and append route are thread safe, i.e. you can call it
> +  from different threads at the same time; when check determines
> +  the initialization is still needed, the threads will use a
> +  single global mutex to perform the initialization just once, the
> +  others will wait on the the thread to actually perform the
> +  initialization.

I have some comments on the example in the doc on the "alloc-append"
side.  _initl() side looks OK.

> + static struct git_attr_check *check;
> + git_attr_check_initl(, "crlf", "ident", NULL);

OK.

>   const char *path;
> + struct git_attr_result result[2];
>  
> + git_check_attr(path, check, result);

OK.  The above two may be easier to understand if they were a single
example, though.

> +. Act on `result.value[]`:
>  
>  
> - const char *value = check->check[0].value;
> + const char *value = result.value[0];

OK.

> @@ -123,12 +135,15 @@ the first step in the above would be different.
>  static struct git_attr_check *check;
>  static void setup_check(const char **argv)
>  {
> + if (check)
> + return; /* already done */
>   check = git_attr_check_alloc();
>   while (*argv) {
>   struct git_attr *attr = git_attr(*argv);
>   git_attr_check_append(check, attr);
>   argv++;
>   }
> + struct git_attr_result *result = git_attr_result_alloc(check);

This does not look like thread-safe.

I could understand it if the calling convention were like this,
though:

if (git_attr_check_alloc()) {
while (*argv) {
... append ...
}
git_attr_check_finished_appending();
}
result = result_alloc();

In this variant, git_attr_check_alloc() is responsible for ensuring
that the "check" is allocated only once just like _initl() is, and
at the same time, it makes simultanous callers wait until the first
caller who appends to the singleton check instance declares that it
finished appending.  The return value signals if you are the first
caller (who is responsible for populating the check and for
declaring the check is ready to use at the end of appending).
Everybody else waits while the first caller is doing the if (...) {
} thing, and then receives false, at which time everybody (including
the first caller) goes on and allocating its own result and start
making queries.

> +* Setup a local variables for the question
> +  `struct git_attr_check` as well as a pointer where the result
> +  `struct git_attr_result` will be stored. Both should be initialized
> +  to NULL.
> +
> +
> +  struct git_attr_check *check = NULL;
> +  struct git_attr_result *result = NULL;
> +
> +
> +* Call `git_all_attrs()`.
>  
> +
> +  git_all_attrs(full_path, , );
> +

OK.

Thanks.


Re: [PATCH] attr: convert to new threadsafe API

2016-10-28 Thread Johannes Schindelin
Hi Stefan,

On Thu, 27 Oct 2016, Stefan Beller wrote:

> * use attr_start on Windows to dynamically initialize the Single Big Attr 
> Mutex

I would have preferred that call in common-main.c, but whatevs...

Thanks you for fixing the bug,
Dscho


[PATCH] attr: convert to new threadsafe API

2016-10-27 Thread Stefan Beller
This revamps the API of the attr subsystem to be thread safe.
Before we had the question and its results in one struct type.
The typical usage of the API was

static struct git_attr_check *check;

if (!check)
check = git_attr_check_initl("text", NULL);

git_check_attr(path, check);
act_on(check->value[0]);

This has a couple of issues when it comes to thread safety:

* the initialization is racy in this implementation. To make it
  thread safe, we need to acquire a mutex, such that only one
  thread is executing the code in git_attr_check_initl.
  As we do not want to introduce a mutex at each call site,
  this is best done in the attr code. However to do so, we need
  to have access to the `check` variable, i.e. the API has to
  look like
git_attr_check_initl(struct git_attr_check**, ...);
  Then one of the threads calling git_attr_check_initl will
  acquire the mutex and init the `check`, while all other threads
  will wait on the mutex just to realize they're late to the
  party and they'll return with no work done.

* While the check for attributes to be questioned only need to
  be initalized once as that part will be read only after its
  initialisation, the answer may be different for each path.
  Because of that we need to decouple the check and the answer,
  such that each thread can obtain an answer for the path it
  is currently processing.

This commit changes the API and adds locking in
git_attr_check_initl that provides the thread safety for constructing
`struct git_attr_check`.

The usage of the new API will be:

/*
 * The initl call will thread-safely check whether the
 * struct git_attr_check has been initialized. We only
 * want to do the initialization work once, hence we do
 * that work inside a thread safe environment.
 */
static struct git_attr_check *check;
git_attr_check_initl(, "text", NULL);

/* We're just asking for one attribute "text". */
git_attr_result myresult[1];

/* Perform the check and act on it: */
git_check_attr(path, check, myresult);
act_on(myresult[0].value);

/*
 * No need to free the check as it is static, hence doesn't leak
 * memory. The result is also static, so no need to free there either.
 */

Signed-off-by: Stefan Beller 
---

* use attr_start on Windows to dynamically initialize the Single Big Attr Mutex
* rewrote the documentation by Junios guiding comments
* interned struct git_attr, i.e. you hand "const char *" to the attr API, and
  internally it will make sense of it. When asking git_all_attrs, however you
  still need to know about git_attrs as you need to find out the name of the
  attribute via `git_attr_name(struct git_attr *)`.
  
If there are no other huge design discussions, I'll reroll the whole series.

Thanks,
Stefan

 Documentation/technical/api-gitattributes.txt | 104 +++---
 archive.c |  11 +-
 attr.c| 150 ++
 attr.h|  76 +++--
 builtin/check-attr.c  |  47 
 builtin/pack-objects.c|  16 +--
 compat/mingw.c|   3 +
 convert.c |  40 +++
 ll-merge.c|  24 +++--
 userdiff.c|  16 +--
 ws.c  |   8 +-
 11 files changed, 305 insertions(+), 190 deletions(-)

diff --git a/Documentation/technical/api-gitattributes.txt 
b/Documentation/technical/api-gitattributes.txt
index 92fc32a..ba04c30 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -16,15 +16,19 @@ Data Structure
of no interest to the calling programs.  The name of the
attribute can be retrieved by calling `git_attr_name()`.
 
-`struct git_attr_check_elem`::
-
-   This structure represents one attribute and its value.
-
 `struct git_attr_check`::
 
-   This structure represents a collection of `git_attr_check_elem`.
+   This structure represents a collection of `struct git_attrs`.
It is passed to `git_check_attr()` function, specifying the
-   attributes to check, and receives their values.
+   attributes to check, and receives their values into a corresponding
+   `struct git_attr_result`.
+
+`struct git_attr_result`::
+
+   This structure represents one results for a check, such that an
+   array of `struct git_attr_results` corresponds to a
+   `struct git_attr_check`. The answers given in that array are in
+   the the same order as the check struct.
 
 
 Attribute Values
@@ -32,7 +36,7 @@ Attribute Values
 
 An attribute for a path can be in one of four states: Set, Unset,
 Unspecified or set to a string, and `.value` member of `struct
-git_attr_check` records