Re: [PATCH v5 10/11] setup_git_directory_gently_1(): avoid die()ing

2017-03-13 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Mon, 13 Mar 2017, Junio C Hamano wrote:
>
>> When a patch series is refactoring an existing function to be used
>> in different codepaths, an existing issue inherited from the
>> original code (e.g. perhaps an existing error checking that is
>> looser than ideal) may have been OK in the original context (e.g.
>> perhaps it died and refused to run until the user corrected the
>> repository), and it still is OK in the codepath that uses the
>> refactored building blocks to replace the original code, but it may
>> not be acceptable to leave the issue in the original code when a new
>> caller calls the refactored building block (e.g. perhaps the
>> refactoring made it possible for a caller to ask the function not to
>> die and instead act on different kinds of errors in different ways).
>
> In this case, ...

It becomes somewhat irritating when you get combative and defensive
unnecessarily.  We already established this case is OK two messages
ago, I think.

The above is only to make sure that people cannot take the "issues
in the original is OK to leave outside the scope of a new series" in
my message out of context and treat it as a general rule to justify
their sloppy patches in the future.  We need to see if issues
inherited from the original is necessary to fix before refactoring
even begins on case-by-case basis, seeing what the requirement of
the new code that uses the refactored code.  And the case-by-case
thing we already did for _this_ case.



Re: [PATCH v5 10/11] setup_git_directory_gently_1(): avoid die()ing

2017-03-13 Thread Johannes Schindelin
Hi Junio,

On Mon, 13 Mar 2017, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > The former case may split into two.  "Yes I agree that is bad and it
> > is the same badness as the version without this change", in which case
> > we may want to leave a "NEEDSWORK" comment in-code so that people can
> > notice when browsing the code (but fixing the badness would be outside
> > the scope of the series), and "yes that is bad and we shouldn't
> > introduce that badness", in which case we do want to fix it in the
> > patch.
> 
> We however need to be a bit careful here, though.
> 
> When a patch series is refactoring an existing function to be used
> in different codepaths, an existing issue inherited from the
> original code (e.g. perhaps an existing error checking that is
> looser than ideal) may have been OK in the original context (e.g.
> perhaps it died and refused to run until the user corrected the
> repository), and it still is OK in the codepath that uses the
> refactored building blocks to replace the original code, but it may
> not be acceptable to leave the issue in the original code when a new
> caller calls the refactored building block (e.g. perhaps the
> refactoring made it possible for a caller to ask the function not to
> die and instead act on different kinds of errors in different ways).

In this case, discover_git_directory() is intended to use the exact same
logic as setup_git_directory() (including bugs and all) so that they do
discover the same directory.

It would not do for discover_git_directory() to detect a *different*
directory, no matter how much "more correct" one would deem it.

Ciao,
Johannes


Re: [PATCH v5 10/11] setup_git_directory_gently_1(): avoid die()ing

2017-03-13 Thread Junio C Hamano
Junio C Hamano  writes:

> The former case may split into two.  "Yes I agree that is bad and it
> is the same badness as the version without this change", in which
> case we may want to leave a "NEEDSWORK" comment in-code so that
> people can notice when browsing the code (but fixing the badness
> would be outside the scope of the series), and "yes that is bad and
> we shouldn't introduce that badness", in which case we do want to
> fix it in the patch.

We however need to be a bit careful here, though.

When a patch series is refactoring an existing function to be used
in different codepaths, an existing issue inherited from the
original code (e.g. perhaps an existing error checking that is
looser than ideal) may have been OK in the original context (e.g.
perhaps it died and refused to run until the user corrected the
repository), and it still is OK in the codepath that uses the
refactored building blocks to replace the original code, but it may
not be acceptable to leave the issue in the original code when a new
caller calls the refactored building block (e.g. perhaps the
refactoring made it possible for a caller to ask the function not to
die and instead act on different kinds of errors in different ways).

So "being bug-to-bug compatible with existing code" is not always a
good thing---we need to see case by case if a problem identified in
the review as inherited from the original needs to be addressed in
the series.

It would be better to address issues we identify even if it is an
old one.  After all, any change has potential to introduce new bugs,
and striving to leave known bug inherited from the old code around
would guarantee that the resulting code will be buggier than the
original ;-) 

But in order to keep bisectability, such "further fixups" should be
done as a follow-up to the series, not as part of it.


Re: [PATCH v5 10/11] setup_git_directory_gently_1(): avoid die()ing

2017-03-13 Thread Junio C Hamano
Johannes Schindelin  writes:

> PLEASE NOTE: the purpose of this patch series is to allow the same
> function (setup_git_directory_gently_1()) to be the work horse for
> setup_git_directory() as before, but also for the new
> discover_git_directory() function that is introduced to fix
> read_early_config() to avoid hardconfig .git/config.
>
> The purpose is *not* to change any current behavior. Please do not ask me
> to do that in this patch series.

Please note that pointing out potential problems is only asking for
confirmation ("yes that is bad") or for enlightenment ("no, you are
reading the code wrong and that bad thing does not happen
because...").  Even in the former case, unless the badness is
introduced by the change, pointing out potential problems is *NOT*
asking to change the patch to fix existing issues.

The former case may split into two.  "Yes I agree that is bad and it
is the same badness as the version without this change", in which
case we may want to leave a "NEEDSWORK" comment in-code so that
people can notice when browsing the code (but fixing the badness
would be outside the scope of the series), and "yes that is bad and
we shouldn't introduce that badness", in which case we do want to
fix it in the patch.

So do not read my (or anybody else's) reviews as _asking_ changes
and making further fixes, and you'll do better.


Re: [PATCH v5 10/11] setup_git_directory_gently_1(): avoid die()ing

2017-03-13 Thread Johannes Schindelin
Hi Junio,

PLEASE NOTE: the purpose of this patch series is to allow the same
function (setup_git_directory_gently_1()) to be the work horse for
setup_git_directory() as before, but also for the new
discover_git_directory() function that is introduced to fix
read_early_config() to avoid hardconfig .git/config.

The purpose is *not* to change any current behavior. Please do not ask me
to do that in this patch series.

Now to your comments:

On Fri, 10 Mar 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > @@ -890,14 +892,22 @@ static enum discovery_result 
> > setup_git_directory_gently_1(struct strbuf *dir,
> > if (one_filesystem)
> > current_device = get_device_or_die(dir->buf, NULL, 0);
> > for (;;) {
> > -   int offset = dir->len;
> > +   int offset = dir->len, error_code = 0;
> >  
> > if (offset > min_offset)
> > strbuf_addch(dir, '/');
> > strbuf_addstr(dir, DEFAULT_GIT_DIR_ENVIRONMENT);
> > -   gitdirenv = read_gitfile(dir->buf);
> > -   if (!gitdirenv && is_git_directory(dir->buf))
> > -   gitdirenv = DEFAULT_GIT_DIR_ENVIRONMENT;
> > +   gitdirenv = read_gitfile_gently(dir->buf, die_on_error ?
> > +   NULL : &error_code);
> > +   if (!gitdirenv) {
> 
> We tried to read ".git" as a regular file, but couldn't.  There are
> some cases but I am having trouble to convince myself all cases are
> covered correctly here, so let me follow the code aloud.

It is subtle and finicky, I agree.

> > +   if (die_on_error ||
> > +   error_code == READ_GITFILE_ERR_NOT_A_FILE) {
> > +   if (is_git_directory(dir->buf))
> > +   gitdirenv = DEFAULT_GIT_DIR_ENVIRONMENT;
> 
> If we are told to die on error, but if it is a Git directory, then
> we do not have to (and want to) die and instead report that
> directory as discovered.
> 
> If we are to report the failure status instead of dying, we also do
> want to recover when the read_gitfile_gentrly() failed only because
> it was a real Git directory.  READ_GITFILE_ERR_NOT_A_FILE could be a
> signal for that, and we recover after making sure it is a Git
> directory.
> 
> Among the READ_GITFILE_ERR_* codes, ERR_NOT_A_FILE is the only one
> we attempt this recovery.  All others just take the "else if" thing
> below.
> 
> What happens when is_git_directory() test here does not pass,
> though?  Let's say stat() in read_gitfile_gently() found a FIFO
> there, it gives us ERR_NOT_A_FILE, is_git_directory() would say
> "Nah", and then ...?  Don't we want the other side for this if()
> statement that returns failure?

The current code as per `master` detects the NOT_A_FILE condition, and as
the parameter return_error_code was passed as NULL (because read_gitfile is
actually #define'd in cache.h to call read_gitfile_gently with NULL as
second parameter), it calls read_gitfile_error_die(). That function
*ignores* the NOT_A_FILE error condition, and as a consequence returns to
read_gitfile_gently() which then returns NULL because there *was* an
error_code.

That means that setup_git_directory_gently_1() will retrieve a NULL from
the read_gitfile() call, which means it then goes on to test whether it
is looking at a .git/ directory via is_git_directory() and if that returns
false, the loop will continue to look at the *parent* directory.

That, in turn, means that what you are asking for is a change in behavior,
and as I am unwilling to introduce a change in behavior with this patch
series, my patch does the exact right thing.

> > +   } else if (error_code &&
> > +  error_code != READ_GITFILE_ERR_STAT_FAILED)
> > +   return GIT_DIR_INVALID_GITFILE;
> > +   }
> 
> On the other hand, if read_gitfile_gently() didn't say "we found
> something that is not a regular file" we come here.  If the error
> came because there wasn't ".git" in the directory we are looking at,
> i.e. stat(2) in read_gitfile_gently() gave ENOENT, it is perfectly
> normal and we just want to go one level up.
> 
> But shouldn't read_gitfile_gently() be inspecting errno when it sees
> a stat() failure?  If there _is_ ".git" but we failed to stat it for
> whatever reason (EACCES, ELOOP,...), we do not want to go up but
> give the INVALID_GITFILE from here, just like other cases, no?
> For that I imagine that ERR_STAT_FAILED needs to be split into two,
> one for true ERR_STAT_FAILED (from which we cannot recover) and the
> other for ERR_ENOENT to signal us that there is nothing there (which
> allows us to go up).

True. But again, you are asking for a *change in behavior*, which is not
the purpose of this patch series.

> By the way, is the "error_code &&" needed?

It is not! I had the order of the if/else blocks completely differently
originally [*1*] and just overlooked tha

Re: [PATCH v5 10/11] setup_git_directory_gently_1(): avoid die()ing

2017-03-10 Thread Junio C Hamano
Johannes Schindelin  writes:

> @@ -890,14 +892,22 @@ static enum discovery_result 
> setup_git_directory_gently_1(struct strbuf *dir,
>   if (one_filesystem)
>   current_device = get_device_or_die(dir->buf, NULL, 0);
>   for (;;) {
> - int offset = dir->len;
> + int offset = dir->len, error_code = 0;
>  
>   if (offset > min_offset)
>   strbuf_addch(dir, '/');
>   strbuf_addstr(dir, DEFAULT_GIT_DIR_ENVIRONMENT);
> - gitdirenv = read_gitfile(dir->buf);
> - if (!gitdirenv && is_git_directory(dir->buf))
> - gitdirenv = DEFAULT_GIT_DIR_ENVIRONMENT;
> + gitdirenv = read_gitfile_gently(dir->buf, die_on_error ?
> + NULL : &error_code);
> + if (!gitdirenv) {

We tried to read ".git" as a regular file, but couldn't.  There are
some cases but I am having trouble to convince myself all cases are
covered correctly here, so let me follow the code aloud.

> + if (die_on_error ||
> + error_code == READ_GITFILE_ERR_NOT_A_FILE) {
> + if (is_git_directory(dir->buf))
> + gitdirenv = DEFAULT_GIT_DIR_ENVIRONMENT;

If we are told to die on error, but if it is a Git directory, then
we do not have to (and want to) die and instead report that
directory as discovered.

If we are to report the failure status instead of dying, we also do
want to recover when the read_gitfile_gentrly() failed only because
it was a real Git directory.  READ_GITFILE_ERR_NOT_A_FILE could be a
signal for that, and we recover after making sure it is a Git
directory.

Among the READ_GITFILE_ERR_* codes, ERR_NOT_A_FILE is the only one
we attempt this recovery.  All others just take the "else if" thing
below.

What happens when is_git_directory() test here does not pass,
though?  Let's say stat() in read_gitfile_gently() found a FIFO
there, it gives us ERR_NOT_A_FILE, is_git_directory() would say
"Nah", and then ...?  Don't we want the other side for this if()
statement that returns failure?

> + } else if (error_code &&
> +error_code != READ_GITFILE_ERR_STAT_FAILED)
> + return GIT_DIR_INVALID_GITFILE;
> + }

On the other hand, if read_gitfile_gently() didn't say "we found
something that is not a regular file" we come here.  If the error
came because there wasn't ".git" in the directory we are looking at,
i.e. stat(2) in read_gitfile_gently() gave ENOENT, it is perfectly
normal and we just want to go one level up.

But shouldn't read_gitfile_gently() be inspecting errno when it sees
a stat() failure?  If there _is_ ".git" but we failed to stat it for
whatever reason (EACCES, ELOOP,...), we do not want to go up but
give the INVALID_GITFILE from here, just like other cases, no?
For that I imagine that ERR_STAT_FAILED needs to be split into two,
one for true ERR_STAT_FAILED (from which we cannot recover) and the
other for ERR_ENOENT to signal us that there is nothing there (which
allows us to go up).

By the way, is the "error_code &&" needed?  Unless the original path
given to read_gitfile_gently() is a NULL pointer, the only time its
return value is NULL is when it has non-zero error_code.


>   strbuf_setlen(dir, offset);
>   if (gitdirenv) {
>   strbuf_addstr(gitdir, gitdirenv);
> @@ -934,7 +944,7 @@ const char *discover_git_directory(struct strbuf *gitdir)
>   return NULL;
>  
>   cwd_len = dir.len;
> - if (setup_git_directory_gently_1(&dir, gitdir) <= 0) {
> + if (setup_git_directory_gently_1(&dir, gitdir, 0) <= 0) {
>   strbuf_release(&dir);
>   return NULL;
>   }
> @@ -994,7 +1004,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>   die_errno(_("Unable to read current working directory"));
>   strbuf_addbuf(&dir, &cwd);
>  
> - switch (setup_git_directory_gently_1(&dir, &gitdir)) {
> + switch (setup_git_directory_gently_1(&dir, &gitdir, 1)) {
>   case GIT_DIR_NONE:
>   prefix = NULL;
>   break;