Re: [RFC/PATCH] attr: Document a new possible thread safe API

2016-10-05 Thread Junio C Hamano
Stefan Beller  writes:

> I think so, instead of resending the documentation, maybe the header
> file shows that we're on the same page, I converted everything except
> attr.c to follow this header attr.h:

OK.  The function signature of git_check_attr() looks suspect (it
does not match the "typical case" illustration in the message you
are responding to), but other than that I think this matches my
expectation.

Thanks for taking this over.


Re: [RFC/PATCH] attr: Document a new possible thread safe API

2016-10-05 Thread Stefan Beller
On Wed, Oct 5, 2016 at 10:00 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> I thought about that, but as I concluded that the get_all_attrs doesn't need
>> conversion to a threading environment, we can keep it as is.
>
> I agree that it is OK for get_all_attrs() to use its own way to ask
> a question and receive an answer to it, that is different from how
> git_check_attr() asks its question.  The threading-support for it is
> an unrelated issue, though (not that I think it needs to be run from
> a multi-threaded caller).
>
>>> ...  I'd expect the most
>>> typical caller to be
>>>
>>> static struct git_attr_check *check;
>>> struct git_attr_result result[2]; /* we want two */
>>>
>>> git_attr_check_initl(&check, "crlf", "ident", NULL);
>>> git_check_attr(path, check, result);
>>> /* result[0] has "crlf", result[1] has "ident" */
>>>
>>> or something like that.
>>
>> I see, that seems to be a clean API. So git_attr_check_initl
>> will lock the mutex and once it got the mutex it can either
>> * return early as someone else did the work
>> * needs to do the actual work
>> and then unlock. In any case the work was done.
>>
>> git_check_attr, which runs in all threads points to the same check,
>> but gets the different results.
>
> Yeah, something along that line.  It seems that we are now on the
> same page?
>

I think so, instead of resending the documentation, maybe the header
file shows that we're on the same page, I converted everything except
attr.c to follow this header attr.h:

---8<---
#ifndef ATTR_H
#define ATTR_H

/* An attribute is a pointer to this opaque structure */
struct git_attr;

/*
 * Given a string, return the gitattribute object that
 * corresponds to it.
 */
extern struct git_attr *git_attr(const char *);

/*
 * Return the name of the attribute represented by the argument.  The
 * return value is a pointer to a null-delimited string that is part
 * of the internal data structure; it should not be modified or freed.
 */
extern const char *git_attr_name(const struct git_attr *);

extern int attr_name_valid(const char *name, size_t namelen);
extern void invalid_attr_name_message(struct strbuf *, const char *, int);

/* Internal use */
extern const char git_attr__true[];
extern const char git_attr__false[];

/* For public to check git_attr_check results */
#define ATTR_TRUE(v) ((v) == git_attr__true)
#define ATTR_FALSE(v) ((v) == git_attr__false)
#define ATTR_UNSET(v) ((v) == NULL)

struct git_attr_check {
int finalized;
int check_nr;
int check_alloc;
struct git_attr **attr;
};

struct git_attr_result {
int check_nr;
int check_alloc;
const char **value;
};

/*
 * Initialize the `git_attr_check` via one of the following three functions:
 *
 * git_attr_check_alloc allocates an empty check, add more attributes via
 *  git_attr_check_append.
 * git_all_attrsallocates a check and fills in all attributes that
 *  are set for the given path.
 * git_attr_check_initl takes a pointer to where the check will be initialized,
 *  followed by all attributes that are to be checked.
 *  This makes it potentially thread safe as it could
 *  internally have a mutex for that memory location.
 *  Currently it is not thread safe!
 */
extern struct git_attr_check *git_attr_check_alloc(void);
extern void git_attr_check_append(struct git_attr_check *, const
struct git_attr *);
extern struct git_attr_check *git_all_attrs(const char *path);
extern void git_attr_check_initl(struct git_attr_check **, const char *, ...);

/* Query a path for its attributes */
extern struct git_attr_result *git_check_attr(const char *path,
 struct git_attr_check *);

/* internal? */
extern void git_attr_check_clear(struct git_attr_check *);

/* After use free the check */
extern void git_attr_check_free(struct git_attr_check *);

enum git_attr_direction {
GIT_ATTR_CHECKIN,
GIT_ATTR_CHECKOUT,
GIT_ATTR_INDEX
};
void git_attr_set_direction(enum git_attr_direction, struct index_state *);

#endif /* ATTR_H */


Re: [RFC/PATCH] attr: Document a new possible thread safe API

2016-10-05 Thread Junio C Hamano
Stefan Beller  writes:

> I thought about that, but as I concluded that the get_all_attrs doesn't need
> conversion to a threading environment, we can keep it as is.

I agree that it is OK for get_all_attrs() to use its own way to ask
a question and receive an answer to it, that is different from how
git_check_attr() asks its question.  The threading-support for it is
an unrelated issue, though (not that I think it needs to be run from
a multi-threaded caller).

>> ...  I'd expect the most
>> typical caller to be
>>
>> static struct git_attr_check *check;
>> struct git_attr_result result[2]; /* we want two */
>>
>> git_attr_check_initl(&check, "crlf", "ident", NULL);
>> git_check_attr(path, check, result);
>> /* result[0] has "crlf", result[1] has "ident" */
>>
>> or something like that.
>
> I see, that seems to be a clean API. So git_attr_check_initl
> will lock the mutex and once it got the mutex it can either
> * return early as someone else did the work
> * needs to do the actual work
> and then unlock. In any case the work was done.
>
> git_check_attr, which runs in all threads points to the same check,
> but gets the different results.

Yeah, something along that line.  It seems that we are now on the
same page?

Thanks.




Re: [RFC/PATCH] attr: Document a new possible thread safe API

2016-10-04 Thread Stefan Beller
On Tue, Oct 4, 2016 at 4:13 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> diff --git a/Documentation/technical/api-gitattributes.txt 
>> b/Documentation/technical/api-gitattributes.txt
>> index 92fc32a..940617e 100644
>> --- a/Documentation/technical/api-gitattributes.txt
>> +++ b/Documentation/technical/api-gitattributes.txt
>> @@ -59,7 +59,10 @@ Querying Specific Attributes
>>empty `struct git_attr_check` can be prepared by calling
>>`git_attr_check_alloc()` function and then attributes you want to
>>ask about can be added to it with `git_attr_check_append()`
>> -  function.
>> +  function. git_attr_check_initl is thread safe, i.e. you can call it
>> +  from different threads at the same time; internally however only one
>> +  call at a time is processed. If the calls from different threads have
>> +  the same arguments, the returned `git_attr_check` may be the same.
>
> I do not think this is enough.  Look at the example for _initl() and
> notice that it keeps the "singleton static check that is initialized
> by the very first caller if the caller notices it is NULL" pattern.
>
> One way to hide that may be to pass the address of that singleton
> pointer to _initl(), so that it can do the "has it been initialized?
> If not, let's prepare the thing" under lock.

Oh, I see. Yeah that makes sense.

>
>> @@ -89,15 +92,21 @@ static void setup_check(void)
>>
>>  
>>   const char *path;
>> + struct git_attr_check *result;
>>
>>   setup_check();
>> - git_check_attr(path, check);
>> + result = git_check_attr(path, check);
>
> I haven't formed a firm opinion, but I suspect your thinking might
> be clouded by too much staring at the current implementation that
> has , pairs inside git_attr_check.  Traditionally, the
> attr subsystem used the same type for the query question and the
> query answer the same type, but it does not have to stay to be the
> case at all.  Have you considered that we are allowed to make these
> two types distinct?

I thought about that, but as I concluded that the get_all_attrs doesn't need
conversion to a threading environment, we can keep it as is.

When keeping the get_all_attrs as is, we need to keep the data structures
as is, (i.e. key,value pair inside git_check_attr), so introducing a new
data type seemed not useful for the threaded part.

>  A caller can share the same question instance
> (i.e. the set of interned , in git_attr_check) with other
> threads as that is a read-only thing, but each of the callers would
> want to have the result array on its own stack if possible
> (e.g. when asking for a known set of attributes, which is the
> majority of the case) to avoid allocation cost.  I'd expect the most
> typical caller to be
>
> static struct git_attr_check *check;
> struct git_attr_result result[2]; /* we want two */
>
> git_attr_check_initl(&check, "crlf", "ident", NULL);
> git_check_attr(path, check, result);
> /* result[0] has "crlf", result[1] has "ident" */
>
> or something like that.

I see, that seems to be a clean API. So git_attr_check_initl
will lock the mutex and once it got the mutex it can either
* return early as someone else did the work
* needs to do the actual work
and then unlock. In any case the work was done.

git_check_attr, which runs in all threads points to the same check,
but gets the different results.

Ok, I'll go in that direction then.

Thanks,
Stefan


Re: [RFC/PATCH] attr: Document a new possible thread safe API

2016-10-04 Thread Junio C Hamano
Stefan Beller  writes:

> diff --git a/Documentation/technical/api-gitattributes.txt 
> b/Documentation/technical/api-gitattributes.txt
> index 92fc32a..940617e 100644
> --- a/Documentation/technical/api-gitattributes.txt
> +++ b/Documentation/technical/api-gitattributes.txt
> @@ -59,7 +59,10 @@ Querying Specific Attributes
>empty `struct git_attr_check` can be prepared by calling
>`git_attr_check_alloc()` function and then attributes you want to
>ask about can be added to it with `git_attr_check_append()`
> -  function.
> +  function. git_attr_check_initl is thread safe, i.e. you can call it
> +  from different threads at the same time; internally however only one
> +  call at a time is processed. If the calls from different threads have
> +  the same arguments, the returned `git_attr_check` may be the same.

I do not think this is enough.  Look at the example for _initl() and
notice that it keeps the "singleton static check that is initialized
by the very first caller if the caller notices it is NULL" pattern.

One way to hide that may be to pass the address of that singleton
pointer to _initl(), so that it can do the "has it been initialized?
If not, let's prepare the thing" under lock.

> @@ -89,15 +92,21 @@ static void setup_check(void)
>  
>  
>   const char *path;
> + struct git_attr_check *result;
>  
>   setup_check();
> - git_check_attr(path, check);
> + result = git_check_attr(path, check);

I haven't formed a firm opinion, but I suspect your thinking might
be clouded by too much staring at the current implementation that
has , pairs inside git_attr_check.  Traditionally, the
attr subsystem used the same type for the query question and the
query answer the same type, but it does not have to stay to be the
case at all.  Have you considered that we are allowed to make these
two types distinct?  A caller can share the same question instance
(i.e. the set of interned , in git_attr_check) with other
threads as that is a read-only thing, but each of the callers would
want to have the result array on its own stack if possible
(e.g. when asking for a known set of attributes, which is the
majority of the case) to avoid allocation cost.  I'd expect the most
typical caller to be

static struct git_attr_check *check;
struct git_attr_result result[2]; /* we want two */

git_attr_check_initl(&check, "crlf", "ident", NULL);
git_check_attr(path, check, result);
/* result[0] has "crlf", result[1] has "ident" */

or something like that.