Re: [PATCHv2 1/2] attr: convert to new threadsafe API
Junio C Hamano writes: > Stefan Beller writes: > >> Yeah, I can make it work without exposing struct git_attr. > > You completely misunderstood me. "struct git_attr" MUST be visible > to the users so that they can ask for the name in git_check.attr[0]. > > What would be nice to hide if you can is the function to intern a > string into a pointer to struct git_attr, i.e. git_attr() function. Even though that was not the primary point of my suggestion, I actually think it is OK to make "struct git_attr" a structure that is opaque to the users of the API if you wanted to. The attr_check structure will have an array of pointers to "struct git_attr", and the structure definition may be visible to the public attr.h header, but the API users won't have to be able to dereference the pointer "struct git_attr *". git_check.attr[0] would be a pointer to an opaque structure from API users' point of view, that can be passed to API function git_attr_name() to read its string. What is nice to hide is the constructor of the structure. What it, i.e. "struct git_attr *git_attr(const char *)", needs to do is to (1) see if the attribute object with the same name already exists in the table of "all known attributes in the universe", and if there is, return that instance, (2) otherwise create a new attribute object, register it to the table and return it. And it needs to do it in a way that is thread-safe. If we have to give access to it to the API users, then we'd need to acquire and release the Big Attr Lock per each call. The calls to git_attr() you need to make in your implementation will be made from two codepaths: * check_initl() acquires the Big Attr Lock, creates a check struct, makes multiple calls to git_attr() to construct the necessary git_attr instances to fill the array and then releases the lock, so the git_attr() constructor does not have to be protected for concurrent access. * check_attr() acquires the Big Attr Lock, calls down to prepare_attr_stack() as necessary to parse .gitattributes files found in the directory hierarchy, which makes calls to git_attr() to record the attributes found in the file. Then it does the matching to fill results[] array and releases the lock. Again, git_attr() constructors are called under the lock, so there is no need for a separate lock. If these are the only callpaths that reach git_attr() to construct new attribute objects, it would mean that you can make this private to attr subsystem and hide it from the users of the API. Otherwise, you would need to rename the git_attr() constructor that used internally under the Big Lock to static struct git_attr *git_attr_locked(const char *); that is defined inside attr.c, and then provide the external version as a thin wrapper that calls it under the Big Lock, i.e. struct git_attr *git_attr(const char *s) { struct git_attr *attr; take_big_attr_lock(); attr = git_attr_locked(s); release_big_attr_lock(); return attr; } That will have to make the big attr lock busier, and it would be good if we can avoid it. That is where my "can we hide git_attr() constructor?" comes from.
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
Stefan Beller writes: > Yeah, I can make it work without exposing struct git_attr. You completely misunderstood me. "struct git_attr" MUST be visible to the users so that they can ask for the name in git_check.attr[0]. What would be nice to hide if you can is the function to intern a string into a pointer to struct git_attr, i.e. git_attr() function.
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
On Wed, Oct 26, 2016 at 5:49 PM, Junio C Hamano wrote: > Stefan Beller writes: > >> extern struct git_attr *git_attr(const char *); >> ... >> +extern void git_attr_check_append(struct git_attr_check *, >> + const struct git_attr *); > > Another thing. Do we still need to expose git_attr() to the outside > callers? If we change git_attr_check_append() to take "const char *" > as its second parameter, can we retire it from the public interface? > > It being an "intern" function, by definition it is not thread-safe. > Its use from prepare_attr_stack() inside git_check_attr() that takes > the Big Attribute Subsystem Lock is safe, but the callers that do > > struct git_attr_check *check = ...; > struct git_attr *text_attr = git_attr("text"); > > git_attr_check_append(check, text_attr); > > would risk racing to register the same entry to the "all the > attributes in the known universe" table. > > If the attribute API does not have to expose git_attr(const char *) > to the external callers at all, that would be ideal. Otherwise, we > would need to rename the current git_attr() that is used internally > under the Big Lock to > > static struct git_attr *git_attr_locked(const char*) > > that is defined inside attr.c, and then provide the external version > as a thin wrapper that calls it under the Big Lock. > > Yeah, I can make it work without exposing struct git_attr. However then the struct git_attr_check will contain more of internals exposed. (As the header file did not know the size of git_attr, the patch under discussion even must use a double pointer to a git_attr inside the git_attr_check as the git_attr size is unknown) So I'll look into removing that struct git_attr completely.
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
Stefan Beller writes: > To explain, you can either have: > struct git_attr_result result[2]; > or > struct git_attr_result *result = git_attr_result_alloc(check); > and both are running just fine in a thread. However you should not > make that variable static. But maybe that is too much common sense > and hence confusing. Yup, if you spelled it out that does make sense, but the description given in the patch was just misleading. Thanks.
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
Stefan Beller writes: > extern struct git_attr *git_attr(const char *); > ... > +extern void git_attr_check_append(struct git_attr_check *, > + const struct git_attr *); Another thing. Do we still need to expose git_attr() to the outside callers? If we change git_attr_check_append() to take "const char *" as its second parameter, can we retire it from the public interface? It being an "intern" function, by definition it is not thread-safe. Its use from prepare_attr_stack() inside git_check_attr() that takes the Big Attribute Subsystem Lock is safe, but the callers that do struct git_attr_check *check = ...; struct git_attr *text_attr = git_attr("text"); git_attr_check_append(check, text_attr); would risk racing to register the same entry to the "all the attributes in the known universe" table. If the attribute API does not have to expose git_attr(const char *) to the external callers at all, that would be ideal. Otherwise, we would need to rename the current git_attr() that is used internally under the Big Lock to static struct git_attr *git_attr_locked(const char*) that is defined inside attr.c, and then provide the external version as a thin wrapper that calls it under the Big Lock.
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
On Wed, Oct 26, 2016 at 5:16 PM, Junio C Hamano wrote: > Stefan Beller writes: > >> +* Allocate an array of `struct git_attr_result` either on the stack >> + or via `git_attr_result_alloc` on the heap when the result size >> + is not known at compile time. The call to initialize >>the result is not thread safe, because different threads need their >>own thread local result anyway. > > Do you want to keep the last sentence? "The call to initialize the > result is not thread safe..."? Is that true? I'll drop that sentence, as it overstates the situation. To explain, you can either have: struct git_attr_result result[2]; or struct git_attr_result *result = git_attr_result_alloc(check); and both are running just fine in a thread. However you should not make that variable static. But maybe that is too much common sense and hence confusing. > >> @@ -103,7 +105,7 @@ To see how attributes "crlf" and "ident" are set >> for different paths. >> const char *path; >> struct git_attr_result result[2]; >> >> -git_check_attr(path, check, result); >> +git_check_attr(path, &check, result); > > What's the point of this change? Isn't check typically a pointer > already? This ought to go to git_attr_check_initl(&check, "crlf", "ident", NULL); instead. >
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
Stefan Beller writes: > +* Allocate an array of `struct git_attr_result` either on the stack > + or via `git_attr_result_alloc` on the heap when the result size > + is not known at compile time. The call to initialize >the result is not thread safe, because different threads need their >own thread local result anyway. Do you want to keep the last sentence? "The call to initialize the result is not thread safe..."? Is that true? > @@ -103,7 +105,7 @@ To see how attributes "crlf" and "ident" are set > for different paths. > const char *path; > struct git_attr_result result[2]; > > -git_check_attr(path, check, result); > +git_check_attr(path, &check, result); What's the point of this change? Isn't check typically a pointer already?
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
On Wed, Oct 26, 2016 at 4:14 PM, Junio C Hamano wrote: > Stefan Beller writes: > >> @@ -53,19 +57,32 @@ value of the attribute for the path. >> Querying Specific Attributes >> >> >> -* Prepare `struct git_attr_check` using git_attr_check_initl() >> +* Prepare a `struct git_attr_check` using `git_attr_check_initl()` >>function, enumerating the names of attributes whose values you are >>interested in, terminated with a NULL pointer. Alternatively, an >> - empty `struct git_attr_check` can be prepared by calling >> - `git_attr_check_alloc()` function and then attributes you want to >> - ask about can be added to it with `git_attr_check_append()` >> - function. >> - >> -* Call `git_check_attr()` to check the attributes for the path. >> - >> -* Inspect `git_attr_check` structure to see how each of the >> - attribute in the array is defined for the path. >> - >> + empty `struct git_attr_check` as allocated by git_attr_check_alloc() >> + can be prepared by calling `git_attr_check_alloc()` function and >> + then attributes you want to ask about can be added to it with >> + `git_attr_check_append()` function. > > I think that my version that was discarded forbade appending once > you started to use the check for querying, because the check was > meant to be used as a key for an attr-stack and the check-specific > attr-stack was planned to keep only the attrs the check is > interested in (and appending is to change the set of attrs the check > is interested in, invalidating the attr-stack at that point). > > If you lost that restriction, that is good (I didn't check the > implementation, though). Otherwise we'd need to say something here. That restriction still applies. Though I think mentioning it in the paragraph where we describe querying makes more sense. > initialization? done > > Grammo? "either on the stack, or dynamically in the heap"? done > > Having result defined statically is not thread safe for that > reason. It is not clear what you mean by "The call to initialize > the result"; having it on the stack or have one dynamically on the > heap ought to be thread safe. done >> -} >> + static struct git_attr_check *check; >> + git_attr_check_initl(check, "crlf", "ident", NULL); > > I think you are still missing "&" here. done >> + if (check) >> + return; /* already done */ >> check = git_attr_check_alloc(); > > You may want to say that this is thread-unsafe. It is not; see the implementation: void git_attr_check_append(struct git_attr_check *check, const struct git_attr *attr) { int i; if (check->finalized) die("BUG: append after git_attr_check structure is finalized"); if (!attr_check_is_dynamic(check)) die("BUG: appending to a statically initialized git_attr_check"); attr_lock(); for (i = 0; i < check->check_nr; i++) if (check->attr[i] == attr) break; if (i == check->check_nr) { ALLOC_GROW(check->attr, check->check_nr + 1, check->check_alloc); check->attr[check->check_nr++] = attr; } attr_unlock(); } >> +* Call `git_all_attrs()`. > > Hmph, the caller does not know what attribute it is interested in, > and it is unclear "how" the former needs to be set up from this > description. Should it prepare an empty one that can be appended? > Good point on clarifying this one. It is just needed to have NULL pointers around: struct git_attr_check *check = NULL; struct git_attr_result *result = NULL; git_all_attrs(full_path, &local_check, &result); // proceed as in the case above All comments from above yield the following diff: diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt index 4d8ef93..d221736 100644 --- a/Documentation/technical/api-gitattributes.txt +++ b/Documentation/technical/api-gitattributes.txt @@ -67,19 +67,21 @@ Querying Specific Attributes Both ways with `git_attr_check_initl()` as well as the alloc and append route are thread safe, i.e. you can call it from different threads at the same time; when check determines - the initialzisation is still needed, the threads will use a + the initialization is still needed, the threads will use a single global mutex to perform the initialization just once, the others will wait on the the thread to actually perform the initialization. -* Allocate an array of `struct git_attr_result` either statically on the - as a variable on the stack or dynamically via `git_attr_result_alloc` - when the result size is not known at compile time. The call to initialize +* Allocate an array of `struct git_attr_result` either on the stack + or via `git_attr_result_alloc` on the heap when the result size + is not known at compile time. The call to initialize the result is not thread safe, because different threads need their own thread local result anyway. * Call `git_check_attr()
Re: [PATCHv2 1/2] attr: convert to new threadsafe API
Stefan Beller writes: > @@ -53,19 +57,32 @@ value of the attribute for the path. > Querying Specific Attributes > > > -* Prepare `struct git_attr_check` using git_attr_check_initl() > +* Prepare a `struct git_attr_check` using `git_attr_check_initl()` >function, enumerating the names of attributes whose values you are >interested in, terminated with a NULL pointer. Alternatively, an > - empty `struct git_attr_check` can be prepared by calling > - `git_attr_check_alloc()` function and then attributes you want to > - ask about can be added to it with `git_attr_check_append()` > - function. > - > -* Call `git_check_attr()` to check the attributes for the path. > - > -* Inspect `git_attr_check` structure to see how each of the > - attribute in the array is defined for the path. > - > + empty `struct git_attr_check` as allocated by git_attr_check_alloc() > + can be prepared by calling `git_attr_check_alloc()` function and > + then attributes you want to ask about can be added to it with > + `git_attr_check_append()` function. I think that my version that was discarded forbade appending once you started to use the check for querying, because the check was meant to be used as a key for an attr-stack and the check-specific attr-stack was planned to keep only the attrs the check is interested in (and appending is to change the set of attrs the check is interested in, invalidating the attr-stack at that point). If you lost that restriction, that is good (I didn't check the implementation, though). Otherwise we'd need to say something here. > + Both ways with `git_attr_check_initl()` as well as the > + alloc and append route are thread safe, i.e. you can call it > + from different threads at the same time; when check determines > + the initialzisation is still needed, the threads will use a initialization? > + single global mutex to perform the initialization just once, the > + others will wait on the the thread to actually perform the > + initialization. > + > +* Allocate an array of `struct git_attr_result` either statically on the > + as a variable on the stack or dynamically via `git_attr_result_alloc` Grammo? "either on the stack, or dynamically in the heap"? > + when the result size is not known at compile time. The call to initialize > + the result is not thread safe, because different threads need their > + own thread local result anyway. Having result defined statically is not thread safe for that reason. It is not clear what you mean by "The call to initialize the result"; having it on the stack or have one dynamically on the heap ought to be thread safe. > +* Call `git_check_attr()` to check the attributes for the path, > + the given `git_attr_result` will be filled with the result. > + > +* Inspect each `git_attr_result` structure to see how > + each of the attribute in the array is defined for the path. > > Example > --- > @@ -76,28 +93,23 @@ To see how attributes "crlf" and "ident" are set for > different paths. >we are checking two attributes): > > > -static struct git_attr_check *check; > -static void setup_check(void) > -{ > - if (check) > - return; /* already done */ > - check = git_attr_check_initl("crlf", "ident", NULL); > -} > + static struct git_attr_check *check; > + git_attr_check_initl(check, "crlf", "ident", NULL); I think you are still missing "&" here. > > > . Call `git_check_attr()` with the prepared `struct git_attr_check`: > > > const char *path; > + struct git_attr_result result[2]; > > - setup_check(); > - git_check_attr(path, check); > + git_check_attr(path, check, result); > > > -. Act on `.value` member of the result, left in `check->check[]`: > +. Act on `result.value[]`: > > > - const char *value = check->check[0].value; > + const char *value = result.value[0]; > > if (ATTR_TRUE(value)) { > The attribute is Set, by listing only the name of the > @@ -123,12 +135,15 @@ the first step in the above would be different. > static struct git_attr_check *check; > static void setup_check(const char **argv) > { > + if (check) > + return; /* already done */ > check = git_attr_check_alloc(); You may want to say that this is thread-unsafe. > while (*argv) { > struct git_attr *attr = git_attr(*argv); > git_attr_check_append(check, attr); > argv++; > } > + struct git_attr_result *result = git_attr_result_alloc(check); > } > > > @@ -138,17 +153,20 @@ Querying All Attributes > > To get the values of all attributes associated with a file: > > +* Setup a local variables for the question > + `struct git_attr_check` as well as a pointer where the result > + `struct git_attr_result` will be stored. > +* Call `git_all_attrs()`. Hmph, the call
[PATCHv2 1/2] attr: convert to new threadsafe API
This revamps the API of the attr subsystem to be thread safe. Before we had the question and its results in one struct type. The typical usage of the API was static struct git_attr_check *check; if (!check) check = git_attr_check_initl("text", NULL); git_check_attr(path, check); act_on(check->value[0]); This has a couple of issues when it comes to thread safety: * the initialization is racy in this implementation. To make it thread safe, we need to acquire a mutex, such that only one thread is executing the code in git_attr_check_initl. As we do not want to introduce a mutex at each call site, this is best done in the attr code. However to do so, we need to have access to the `check` variable, i.e. the API has to look like git_attr_check_initl(struct git_attr_check*, ...); Then one of the threads calling git_attr_check_initl will acquire the mutex and init the `check`, while all other threads will wait on the mutex just to realize they're late to the party and they'll return with no work done. * While the check for attributes to be questioned only need to be initalized once as that part will be read only after its initialisation, the answer may be different for each path. Because of that we need to decouple the check and the answer, such that each thread can obtain an answer for the path it is currently processing. This commit changes the API and adds locking in git_attr_check_initl that provides the thread safety for constructing `struct git_attr_check`. The usage of the new API will be: /* * The initl call will thread-safely check whether the * struct git_attr_check has been initialized. We only * want to do the initialization work once, hence we do * that work inside a thread safe environment. */ static struct git_attr_check *check; git_attr_check_initl(&check, "text", NULL); /* We're just asking for one attribute "text". */ git_attr_result myresult[1]; /* Perform the check and act on it: */ git_check_attr(path, check, myresult); act_on(myresult[0].value); /* * No need to free the check as it is static, hence doesn't leak * memory. The result is also static, so no need to free there either. */ Signed-off-by: Stefan Beller --- In the reroll of this series, I plan to use the patch as-is below: * Documentation/commit message matches actual implementation * initialize the attr lock statically. It was uninitialized in the first version, which works for me on Linux, but not so in Windows; I split off the static initialized mutexes on Windows as a separate patch, see https://public-inbox.org/git/cagz79kzryb-5jgif04btk1v9tgfj-tnahquk+1lb7xeecu7...@mail.gmail.com which would be a requirement for this patch. Documentation/technical/api-gitattributes.txt | 94 ++--- archive.c | 11 +- attr.c| 143 ++ attr.h| 71 - builtin/check-attr.c | 35 --- builtin/pack-objects.c| 16 +-- convert.c | 40 +++ ll-merge.c| 24 +++-- userdiff.c| 16 +-- ws.c | 8 +- 10 files changed, 280 insertions(+), 178 deletions(-) diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt index 92fc32a..4d8ef93 100644 --- a/Documentation/technical/api-gitattributes.txt +++ b/Documentation/technical/api-gitattributes.txt @@ -16,15 +16,19 @@ Data Structure of no interest to the calling programs. The name of the attribute can be retrieved by calling `git_attr_name()`. -`struct git_attr_check_elem`:: - - This structure represents one attribute and its value. - `struct git_attr_check`:: - This structure represents a collection of `git_attr_check_elem`. + This structure represents a collection of `struct git_attrs`. It is passed to `git_check_attr()` function, specifying the - attributes to check, and receives their values. + attributes to check, and receives their values into a corresponding + `struct git_attr_result`. + +`struct git_attr_result`:: + + This structure represents one results for a check, such that an + array of `struct git_attr_results` corresponds to a + `struct git_attr_check`. The answers given in that array are in + the the same order as the check struct. Attribute Values @@ -32,7 +36,7 @@ Attribute Values An attribute for a path can be in one of four states: Set, Unset, Unspecified or set to a string, and `.value` member of `struct -git_attr_check` records it. There are three macros to check these: +git_attr_result` records it. There are three macro