Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
24.10.2017 22:52, Jeff King wrote: > On Tue, Oct 24, 2017 at 07:11:24PM +0200, Martin Ågren wrote: > >> On 24 October 2017 at 18:45, Eric Sunshine wrote: >>> On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller wrote: On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin wrote: > Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in > init_merge_options(). > Consequential call of getenv() may return NULL pointer and strtol() > crashes. > However the stored pointer to the obtained getenv() result may be > invalidated > by some other getenv() call from another thread as getenv() is not > thread-safe. >> >> I'm having trouble wrapping my head around this. Under which >> circumstances could the second call in the current code return NULL, but >> the code after your patch behave in a well-defined (and correct) way? > > Yeah, it's not at all clear to me this is solving a real problem. I know > Andrey mentioned playing around with fault injection in an earlier > thread, so I'm wondering if there is an artificial fault being injected > into the second getenv() call. Which does not seem like something that > should be possible in the real world. > > I definitely agree with the sentiment that as few things as possible > should happen between calling getenv() and using its result. I've seen > real bugs there from unexpected invalidation of the static buffer. > > -Peff Thanks for your comments. Jeff is right: there were some artificial fault injections imitating valid failures of different functions (syscalls, libc and so on). And yes - in the real life there is no problems with current code as there are no other threads running. However it's not a good practice to double call getenv() with the same argument: * Code readability. * Still no guaranty that the second call will be valid: some linked library may be compromised or LD_PRELOADed with the aim to create a race with getenv(). I believe there is no profit doing it here but it's just an explanation. In my opinion, here it's ok to save the pointer returned from the single getnev() call doing as few actions as possible between getenv() and strtol() calls. I'll change the patch. -- Best regards, Andrey Okoshkin
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
Thanks for your review. 24.10.2017 19:28, Stefan Beller wrote: > So I think this function is never called from within a threaded environment > in git. You are right, it's just a hypothetic case. > Despite not being in a threaded environment, I wonder if we want to > minimize the time between calling getenv and the use of the result, > i.e. declare merge_verbosity here, but assign it later, just before the > condition? > > (The compiler may shuffle stuff around anyway, so this is a > moot suggestion; It gears mostly towards making the code more > readable/maintainable when presenting this part of the code > to the user.) > > With or without this change: > Reviewed-by: Stefan Beller Yes, in current situation it's more for readability. And I'll make the usage of merge_verbosity just after the assignment. -- Best regards, Andrey Okoshkin
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
24.10.2017 19:45, Eric Sunshine wrote: > I feel uncomfortable about this change, not due to lack of thread > safety, but due to the distance between the getenv() invocation and > its point of use. See below for more detail. Thanks, the usage must be just after the assignment. -- Best regards, Andrey Okoshkin
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Wed, Oct 25, 2017 at 12:07:12AM -0400, Eric Sunshine wrote: > On Tue, Oct 24, 2017 at 9:48 PM, Junio C Hamano wrote: > > Jeff King writes: > >> I definitely agree with the sentiment that as few things as possible > >> should happen between calling getenv() and using its result. I've seen > >> real bugs there from unexpected invalidation of the static buffer. > > > > Sure. I do not think we mind xstrdup()ing the result and freeing > > after we are done, though ;-) > > More specifically, xstrdup_or_null(getenv(...)), if that route is chosen. That would be the way to do it, but I do not see thta we need to record the string at all. The current code is calling strtol on it on it immediately. So the options are: 1. Save the result of getenv() in a variable. If it is non-NULL, then immediately call strtol() on it. 2. Do nothing. The double-call to getenv() is actually fine in the real world as it will return consistent results. But the patch under discussion, which calls getenv() then expects it to be correct after a call to merge_recursive_config(), introduces a problem. -Peff
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 9:48 PM, Junio C Hamano wrote: > Jeff King writes: >> I definitely agree with the sentiment that as few things as possible >> should happen between calling getenv() and using its result. I've seen >> real bugs there from unexpected invalidation of the static buffer. > > Sure. I do not think we mind xstrdup()ing the result and freeing > after we are done, though ;-) More specifically, xstrdup_or_null(getenv(...)), if that route is chosen.
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
Jeff King writes: > I definitely agree with the sentiment that as few things as possible > should happen between calling getenv() and using its result. I've seen > real bugs there from unexpected invalidation of the static buffer. Sure. I do not think we mind xstrdup()ing the result and freeing after we are done, though ;-)
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 07:11:24PM +0200, Martin Ågren wrote: > On 24 October 2017 at 18:45, Eric Sunshine wrote: > > On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller wrote: > >> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin > >> wrote: > >>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in > >>> init_merge_options(). > >>> Consequential call of getenv() may return NULL pointer and strtol() > >>> crashes. > >>> However the stored pointer to the obtained getenv() result may be > >>> invalidated > >>> by some other getenv() call from another thread as getenv() is not > >>> thread-safe. > > I'm having trouble wrapping my head around this. Under which > circumstances could the second call in the current code return NULL, but > the code after your patch behave in a well-defined (and correct) way? Yeah, it's not at all clear to me this is solving a real problem. I know Andrey mentioned playing around with fault injection in an earlier thread, so I'm wondering if there is an artificial fault being injected into the second getenv() call. Which does not seem like something that should be possible in the real world. I definitely agree with the sentiment that as few things as possible should happen between calling getenv() and using its result. I've seen real bugs there from unexpected invalidation of the static buffer. -Peff
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On 24 October 2017 at 18:45, Eric Sunshine wrote: > On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller wrote: >> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin >> wrote: >>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in >>> init_merge_options(). >>> Consequential call of getenv() may return NULL pointer and strtol() crashes. >>> However the stored pointer to the obtained getenv() result may be >>> invalidated >>> by some other getenv() call from another thread as getenv() is not >>> thread-safe. I'm having trouble wrapping my head around this. Under which circumstances could the second call in the current code return NULL, but the code after your patch behave in a well-defined (and correct) way? > The distance between getenv() and the point where the value is > actually used is a big concern due to not knowing what is or might be > going on in called functions between the two points. According to [1], > the value returned by getenv() could be invalidated by another call to > getenv() (or setenv() or unsetenv() or putenv()), and we don't have > guarantee that we're safe from such invalidation considering that this > function calls out to others. For instance, after getenv() but before > the value is used, init_merge_options() calls merge_recursive_config() > which calls git_config() which calls git_xmerge_config(), and so on. > > For this reason, I have difficulty endorsing this change as-is. Yeah. The call should be immediately before `merge_verbosity` is used. Then, if a compiler wants to move the call, it has to do the work and prove that it's ok.
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller wrote: > On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin > wrote: >> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in >> init_merge_options(). >> Consequential call of getenv() may return NULL pointer and strtol() crashes. >> However the stored pointer to the obtained getenv() result may be invalidated >> by some other getenv() call from another thread as getenv() is not >> thread-safe. > > But do we have other threads running at the time? I feel uncomfortable about this change, not due to lack of thread safety, but due to the distance between the getenv() invocation and its point of use. See below for more detail. >> Signed-off-by: Andrey Okoshkin >> --- >> diff --git a/merge-recursive.c b/merge-recursive.c >> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct >> merge_options *o) >> void init_merge_options(struct merge_options *o) >> { >> + const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY"); > > Despite not being in a threaded environment, I wonder if we want to > minimize the time between calling getenv and the use of the result, > i.e. declare merge_verbosity here, but assign it later, just before the > condition? > > With or without this change: > Reviewed-by: Stefan Beller The distance between getenv() and the point where the value is actually used is a big concern due to not knowing what is or might be going on in called functions between the two points. According to [1], the value returned by getenv() could be invalidated by another call to getenv() (or setenv() or unsetenv() or putenv()), and we don't have guarantee that we're safe from such invalidation considering that this function calls out to others. For instance, after getenv() but before the value is used, init_merge_options() calls merge_recursive_config() which calls git_config() which calls git_xmerge_config(), and so on. For this reason, I have difficulty endorsing this change as-is. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html >> memset(o, 0, sizeof(struct merge_options)); >> o->verbosity = 2; >> o->buffer_output = 1; >> @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o) >> o->renormalize = 0; >> o->detect_rename = 1; >> merge_recursive_config(o); >> - if (getenv("GIT_MERGE_VERBOSITY")) >> - o->verbosity = >> - strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10); >> + if (merge_verbosity) >> + o->verbosity = strtol(merge_verbosity, NULL, 10); >> if (o->verbosity >= 5) >> o->buffer_output = 0; >> strbuf_init(&o->obuf, 0); >> -- >> 2.14.3
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin wrote: > Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in > init_merge_options(). > Consequential call of getenv() may return NULL pointer and strtol() crashes. > However the stored pointer to the obtained getenv() result may be invalidated > by some other getenv() call from another thread as getenv() is not > thread-safe. But do we have other threads running at the time? Inspecting the four callsites: * sequencer.c: The prior lines to hold the index lock suggests we're not in a threaded environment * builtin/merge-recursive.c: In cmd_merge_recursive and we're fiddling with argv/argc, which suggests we in a main function, not having threads around * builtin/am.c: fall_back_threeway called by am_run. (am is not threaded) * builtin/merge: In try_merge_strategy called from the main function, also index locking * builtin/checkout.c: merge_working_tree also locks the index. So I think this function is never called from within a threaded environment in git. > > Signed-off-by: Andrey Okoshkin > --- > merge-recursive.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/merge-recursive.c b/merge-recursive.c > index 1494ffdb8..eaac98145 100644 > --- a/merge-recursive.c > +++ b/merge-recursive.c > @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options > *o) > > void init_merge_options(struct merge_options *o) > { > + const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY"); Despite not being in a threaded environment, I wonder if we want to minimize the time between calling getenv and the use of the result, i.e. declare merge_verbosity here, but assign it later, just before the condition? (The compiler may shuffle stuff around anyway, so this is a moot suggestion; It gears mostly towards making the code more readable/maintainable when presenting this part of the code to the user.) With or without this change: Reviewed-by: Stefan Beller > memset(o, 0, sizeof(struct merge_options)); > o->verbosity = 2; > o->buffer_output = 1; > @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o) > o->renormalize = 0; > o->detect_rename = 1; > merge_recursive_config(o); > - if (getenv("GIT_MERGE_VERBOSITY")) > - o->verbosity = > - strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10); > + if (merge_verbosity) > + o->verbosity = strtol(merge_verbosity, NULL, 10); > if (o->verbosity >= 5) > o->buffer_output = 0; > strbuf_init(&o->obuf, 0); > -- > 2.14.3