Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin

24.10.2017 22:52, Jeff King wrote:
> On Tue, Oct 24, 2017 at 07:11:24PM +0200, Martin Ågren wrote:
> 
>> On 24 October 2017 at 18:45, Eric Sunshine  wrote:
>>> On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
 On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
 wrote:
> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
> init_merge_options().
> Consequential call of getenv() may return NULL pointer and strtol() 
> crashes.
> However the stored pointer to the obtained getenv() result may be 
> invalidated
> by some other getenv() call from another thread as getenv() is not 
> thread-safe.
>>
>> I'm having trouble wrapping my head around this. Under which
>> circumstances could the second call in the current code return NULL, but
>> the code after your patch behave in a well-defined (and correct) way?
> 
> Yeah, it's not at all clear to me this is solving a real problem. I know
> Andrey mentioned playing around with fault injection in an earlier
> thread, so I'm wondering if there is an artificial fault being injected
> into the second getenv() call. Which does not seem like something that
> should be possible in the real world.
> 
> I definitely agree with the sentiment that as few things as possible
> should happen between calling getenv() and using its result. I've seen
> real bugs there from unexpected invalidation of the static buffer.
> 
> -Peff

Thanks for your comments.

Jeff is right: there were some artificial fault injections imitating valid 
failures
of different functions (syscalls, libc and so on).
And yes - in the real life there is no problems with current code as there are 
no other
threads running.
However it's not a good practice to double call getenv() with the same argument:
* Code readability.
* Still no guaranty that the second call will be valid: some linked library may 
be
compromised or LD_PRELOADed with the aim to create a race with getenv(). I 
believe
there is no profit doing it here but it's just an explanation.

In my opinion, here it's ok to save the pointer returned from the single 
getnev() call
doing as few actions as possible between getenv() and strtol() calls.
I'll change the patch.

-- 
Best regards,
Andrey Okoshkin


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin
Thanks for your review.

24.10.2017 19:28, Stefan Beller wrote:
> So I think this function is never called from within a threaded environment
> in git.
You are right, it's just a hypothetic case.
 
> Despite not being in a threaded environment, I wonder if we want to
> minimize the time between  calling getenv and the use of the result,
> i.e. declare merge_verbosity here, but assign it later, just before the
> condition?
> 
> (The compiler may shuffle stuff around anyway, so this is a
> moot suggestion; It gears mostly towards making the code more
> readable/maintainable when presenting this part of the code
> to the user.)
> 
> With or without this change:
> Reviewed-by: Stefan Beller 
Yes, in current situation it's more for readability. And I'll make the usage
of merge_verbosity just after the assignment.

-- 
Best regards,
Andrey Okoshkin


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Andrey Okoshkin
24.10.2017 19:45, Eric Sunshine wrote:
> I feel uncomfortable about this change, not due to lack of thread
> safety, but due to the distance between the getenv() invocation and
> its point of use. See below for more detail.

Thanks, the usage must be just after the assignment.

-- 
Best regards,
Andrey Okoshkin


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-25 Thread Jeff King
On Wed, Oct 25, 2017 at 12:07:12AM -0400, Eric Sunshine wrote:

> On Tue, Oct 24, 2017 at 9:48 PM, Junio C Hamano  wrote:
> > Jeff King  writes:
> >> I definitely agree with the sentiment that as few things as possible
> >> should happen between calling getenv() and using its result. I've seen
> >> real bugs there from unexpected invalidation of the static buffer.
> >
> > Sure.  I do not think we mind xstrdup()ing the result and freeing
> > after we are done, though ;-)
> 
> More specifically, xstrdup_or_null(getenv(...)), if that route is chosen.

That would be the way to do it, but I do not see thta we need to record
the string at all. The current code is calling strtol on it on it
immediately.

So the options are:

  1. Save the result of getenv() in a variable. If it is non-NULL, then
 immediately call strtol() on it.

  2. Do nothing. The double-call to getenv() is actually fine in the
 real world as it will return consistent results.

But the patch under discussion, which calls getenv() then expects it
to be correct after a call to merge_recursive_config(), introduces a
problem.

-Peff


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Eric Sunshine
On Tue, Oct 24, 2017 at 9:48 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>> I definitely agree with the sentiment that as few things as possible
>> should happen between calling getenv() and using its result. I've seen
>> real bugs there from unexpected invalidation of the static buffer.
>
> Sure.  I do not think we mind xstrdup()ing the result and freeing
> after we are done, though ;-)

More specifically, xstrdup_or_null(getenv(...)), if that route is chosen.


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Junio C Hamano
Jeff King  writes:

> I definitely agree with the sentiment that as few things as possible
> should happen between calling getenv() and using its result. I've seen
> real bugs there from unexpected invalidation of the static buffer.

Sure.  I do not think we mind xstrdup()ing the result and freeing
after we are done, though ;-)


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 07:11:24PM +0200, Martin Ågren wrote:

> On 24 October 2017 at 18:45, Eric Sunshine  wrote:
> > On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
> >> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
> >> wrote:
> >>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
> >>> init_merge_options().
> >>> Consequential call of getenv() may return NULL pointer and strtol() 
> >>> crashes.
> >>> However the stored pointer to the obtained getenv() result may be 
> >>> invalidated
> >>> by some other getenv() call from another thread as getenv() is not 
> >>> thread-safe.
> 
> I'm having trouble wrapping my head around this. Under which
> circumstances could the second call in the current code return NULL, but
> the code after your patch behave in a well-defined (and correct) way?

Yeah, it's not at all clear to me this is solving a real problem. I know
Andrey mentioned playing around with fault injection in an earlier
thread, so I'm wondering if there is an artificial fault being injected
into the second getenv() call. Which does not seem like something that
should be possible in the real world.

I definitely agree with the sentiment that as few things as possible
should happen between calling getenv() and using its result. I've seen
real bugs there from unexpected invalidation of the static buffer.

-Peff


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Martin Ågren
On 24 October 2017 at 18:45, Eric Sunshine  wrote:
> On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
>> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
>> wrote:
>>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
>>> init_merge_options().
>>> Consequential call of getenv() may return NULL pointer and strtol() crashes.
>>> However the stored pointer to the obtained getenv() result may be 
>>> invalidated
>>> by some other getenv() call from another thread as getenv() is not 
>>> thread-safe.

I'm having trouble wrapping my head around this. Under which
circumstances could the second call in the current code return NULL, but
the code after your patch behave in a well-defined (and correct) way?

> The distance between getenv() and the point where the value is
> actually used is a big concern due to not knowing what is or might be
> going on in called functions between the two points. According to [1],
> the value returned by getenv() could be invalidated by another call to
> getenv() (or setenv() or unsetenv() or putenv()), and we don't have
> guarantee that we're safe from such invalidation considering that this
> function calls out to others. For instance, after getenv() but before
> the value is used, init_merge_options() calls merge_recursive_config()
> which calls git_config() which calls git_xmerge_config(), and so on.
>
> For this reason, I have difficulty endorsing this change as-is.

Yeah. The call should be immediately before `merge_verbosity` is used.
Then, if a compiler wants to move the call, it has to do the work and
prove that it's ok.


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Eric Sunshine
On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
> wrote:
>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
>> init_merge_options().
>> Consequential call of getenv() may return NULL pointer and strtol() crashes.
>> However the stored pointer to the obtained getenv() result may be invalidated
>> by some other getenv() call from another thread as getenv() is not 
>> thread-safe.
>
> But do we have other threads running at the time?

I feel uncomfortable about this change, not due to lack of thread
safety, but due to the distance between the getenv() invocation and
its point of use. See below for more detail.

>> Signed-off-by: Andrey Okoshkin 
>> ---
>> diff --git a/merge-recursive.c b/merge-recursive.c
>> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct 
>> merge_options *o)
>>  void init_merge_options(struct merge_options *o)
>>  {
>> +   const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY");
>
> Despite not being in a threaded environment, I wonder if we want to
> minimize the time between  calling getenv and the use of the result,
> i.e. declare merge_verbosity here, but assign it later, just before the
> condition?
>
> With or without this change:
> Reviewed-by: Stefan Beller 

The distance between getenv() and the point where the value is
actually used is a big concern due to not knowing what is or might be
going on in called functions between the two points. According to [1],
the value returned by getenv() could be invalidated by another call to
getenv() (or setenv() or unsetenv() or putenv()), and we don't have
guarantee that we're safe from such invalidation considering that this
function calls out to others. For instance, after getenv() but before
the value is used, init_merge_options() calls merge_recursive_config()
which calls git_config() which calls git_xmerge_config(), and so on.

For this reason, I have difficulty endorsing this change as-is.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html

>> memset(o, 0, sizeof(struct merge_options));
>> o->verbosity = 2;
>> o->buffer_output = 1;
>> @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o)
>> o->renormalize = 0;
>> o->detect_rename = 1;
>> merge_recursive_config(o);
>> -   if (getenv("GIT_MERGE_VERBOSITY"))
>> -   o->verbosity =
>> -   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
>> +   if (merge_verbosity)
>> +   o->verbosity = strtol(merge_verbosity, NULL, 10);
>> if (o->verbosity >= 5)
>> o->buffer_output = 0;
>> strbuf_init(>obuf, 0);
>> --
>> 2.14.3


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  wrote:
> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
> init_merge_options().
> Consequential call of getenv() may return NULL pointer and strtol() crashes.
> However the stored pointer to the obtained getenv() result may be invalidated
> by some other getenv() call from another thread as getenv() is not 
> thread-safe.

But do we have other threads running at the time?
Inspecting the four callsites:
* sequencer.c:
  The prior lines to hold the index lock suggests we're not in a threaded
  environment
* builtin/merge-recursive.c:
  In cmd_merge_recursive and we're fiddling with argv/argc, which
  suggests we in a main function, not having threads around
* builtin/am.c: fall_back_threeway called by am_run.
  (am is not threaded)
* builtin/merge:
  In try_merge_strategy called from the main function, also index locking
* builtin/checkout.c:
  merge_working_tree also locks the index.

So I think this function is never called from within a threaded environment
in git.

>
> Signed-off-by: Andrey Okoshkin 
> ---
>  merge-recursive.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 1494ffdb8..eaac98145 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options 
> *o)
>
>  void init_merge_options(struct merge_options *o)
>  {
> +   const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY");

Despite not being in a threaded environment, I wonder if we want to
minimize the time between  calling getenv and the use of the result,
i.e. declare merge_verbosity here, but assign it later, just before the
condition?

(The compiler may shuffle stuff around anyway, so this is a
moot suggestion; It gears mostly towards making the code more
readable/maintainable when presenting this part of the code
to the user.)

With or without this change:
Reviewed-by: Stefan Beller 


> memset(o, 0, sizeof(struct merge_options));
> o->verbosity = 2;
> o->buffer_output = 1;
> @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o)
> o->renormalize = 0;
> o->detect_rename = 1;
> merge_recursive_config(o);
> -   if (getenv("GIT_MERGE_VERBOSITY"))
> -   o->verbosity =
> -   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
> +   if (merge_verbosity)
> +   o->verbosity = strtol(merge_verbosity, NULL, 10);
> if (o->verbosity >= 5)
> o->buffer_output = 0;
> strbuf_init(>obuf, 0);
> --
> 2.14.3