Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-21 Thread Stefan Beller
>> I guess we can still refactor later, it's just one
>> thing to thing about when introducing an API
>> that will likely be used a lot down the road.
>
> I'm not sure what we want right now, hence why I left it a little more
> vague.  At this point in time all the relevant callers I can think of
> (or rather potential callers) don't care about the failure and just want
> to know if it succeeded.  I think it would be ok to do a small refactor
> at a later time if we really needed to provide the reason for the
> failure.  Unless of course someone feels strongly enough that it needs
> to be addressed right now.  If we did address it now then we would need
> a group of #define's or maybe an enum to describe the failure modes.

I do not feel strongly, just wanted to draw your attention to it.
And having thought about it, refactoring down the road is likely quite cheap,
so this was a useless bikeshedding attempt.

>>
>> I applaud the effort towards documenting what each variable is
>> supposed to contain. But some of them read like
>>
>> /* increments i by one */
>> i++;
>>
>> which is considered bad comment style (it doesn't add
>> more information, it just wastes a line), so specifically for
>> all the "Path to X" comments:
>> * Are they absolute path, or relative path?
>>   If relative, then relative to what?
>> * Can they be NULL? When?
>>
>> (* Why do we need so many path?
>> Could one of them be constructed using
>> another and then hardcoding a string relative to it?
>> This question may rather be answered in the commit
>> message)
>
> Thanks for pointing this out.  I'll work a little bit more on the
> comments to be more descriptive.  I do think that all field names should
> probably be commented though.

Thanks!
Stefan


Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-21 Thread Brandon Williams
On 06/20, Stefan Beller wrote:
> On Tue, Jun 20, 2017 at 12:19 PM, Brandon Williams  wrote:
> > Introduce the repository object 'struct repository' which can be used to
> > hold all state pertaining to a git repository.
> >
> > Some of the benefits of object-ifying a repository are:
> >
> >   1. Make the code base more readable and easier to reason about.
> >
> >   2. Allow for working on multiple repositories, specifically
> >  submodules, within the same process.  Currently the process for
> >  working on a submodule involves setting up an argv_array of options
> >  for a particular command and then launching a child process to
> >  execute the command in the context of the submodule.  This is
> >  clunky and can require lots of little hacks in order to ensure
> >  correctness.  Ideally it would be nice to simply pass a repository
> >  and an options struct to a command.
> >
> >   3. Eliminating reliance on global state will make it easier to
> >  enable the use of threading to improve performance.
> >
> > Signed-off-by: Brandon Williams 
> 
> > +/*
> > + * Initialize 'repo' based on the provided 'gitdir'.
> > + * Return 0 upon success and a non-zero value upon failure.
> 
> Non zero or negative? The point of the question is if we want to
> ask users of this function to be cautious early on. So in the future,
> do we want to rather see
> 
> if (repo_init(...))
> die("you're doomed");
> 
> or rather
> 
> int x = repo_init(...);
> if (x < 0)
> die("you're doomed");
> else if (x == 1)
> warning("you're not doomed, but close."\
>  "Not distimming the gostaks.")
> else
> ; /* we're fine, carry on with life */
> 
> I guess we can still refactor later, it's just one
> thing to thing about when introducing an API
> that will likely be used a lot down the road.

I'm not sure what we want right now, hence why I left it a little more
vague.  At this point in time all the relevant callers I can think of
(or rather potential callers) don't care about the failure and just want
to know if it succeeded.  I think it would be ok to do a small refactor
at a later time if we really needed to provide the reason for the
failure.  Unless of course someone feels strongly enough that it needs
to be addressed right now.  If we did address it now then we would need
a group of #define's or maybe an enum to describe the failure modes.

> 
> > +struct repository {
> > +   /* Environment */
> > +   /* Path to the git directory */
> > +   char *gitdir;
> > +
> > +   /* Path to the common git directory */
> > +   char *commondir;
> > +
> > +   /* Path to the repository's object store */
> > +   char *objectdir;
> > +
> > +   /* Path to the repository's graft file */
> > +   char *graft_file;
> > +
> > +   /* Path to the current worktree's index file */
> > +   char *index_file;
> > +
> > +   /* Path to the working directory */
> > +   char *worktree;
> > +
> > +   /* Configurations */
> > +   /*
> > +* Bit used during initialization to indicate if repository state 
> > (like
> > +* the location of the 'objectdir') should be read from the
> > +* environment.  By default this bit will be set at the begining of
> > +* 'repo_init()' so that all repositories will ignore the 
> > environment.
> > +* The exception to this is 'the_repository', which doesn't go 
> > through
> > +* the normal 'repo_init()' process.
> > +*/
> > +   unsigned ignore_env:1;
> > +
> > +   /* Indicate if a repository has a different 'commondir' from 
> > 'gitdir' */
> > +   unsigned different_commondir:1;
> > +};
> 
> I applaud the effort towards documenting what each variable is
> supposed to contain. But some of them read like
> 
> /* increments i by one */
> i++;
> 
> which is considered bad comment style (it doesn't add
> more information, it just wastes a line), so specifically for
> all the "Path to X" comments:
> * Are they absolute path, or relative path?
>   If relative, then relative to what?
> * Can they be NULL? When?
> 
> (* Why do we need so many path?
> Could one of them be constructed using
> another and then hardcoding a string relative to it?
> This question may rather be answered in the commit
> message)

Thanks for pointing this out.  I'll work a little bit more on the
comments to be more descriptive.  I do think that all field names should
probably be commented though.

-- 
Brandon Williams


Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Brandon Williams
On 06/20, Jonathan Tan wrote:
> On Tue, 20 Jun 2017 12:19:35 -0700
> Brandon Williams  wrote:
> 
> > Introduce the repository object 'struct repository' which can be used to
> > hold all state pertaining to a git repository.
> > 
> > Some of the benefits of object-ifying a repository are:
> > 
> >   1. Make the code base more readable and easier to reason about.
> > 
> >   2. Allow for working on multiple repositories, specifically
> >  submodules, within the same process.  Currently the process for
> >  working on a submodule involves setting up an argv_array of options
> >  for a particular command and then launching a child process to
> >  execute the command in the context of the submodule.  This is
> >  clunky and can require lots of little hacks in order to ensure
> >  correctness.  Ideally it would be nice to simply pass a repository
> >  and an options struct to a command.
> > 
> >   3. Eliminating reliance on global state will make it easier to
> >  enable the use of threading to improve performance.
> 
> These would indeed be nice to have.
> 
> > +/* called after setting gitdir */
> > +static void repo_setup_env(struct repository *repo)
> > +{
> > +   struct strbuf sb = STRBUF_INIT;
> > +
> > +   if (!repo->gitdir)
> > +   BUG("gitdir wasn't set before setting up the environment");
> > +
> > +   repo->different_commondir = find_common_dir(, repo->gitdir,
> > +   !repo->ignore_env);
> > +   repo->commondir = strbuf_detach(, NULL);
> > +   repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
> > +   "objects", !repo->ignore_env);
> > +   repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
> > +"info/grafts", !repo->ignore_env);
> > +   repo->index_file = git_path_from_env(INDEX_ENVIRONMENT, repo->gitdir,
> > +"index", !repo->ignore_env);
> > +}
> 
> It seems that this function is only used once in repo_set_gitdir() -
> could this be inlined there instead? Then you wouldn't need the comment
> and the !repo->gitdir check.

I'd like to keep them separate as it makes things a little clearer in my
opinion, smaller functions and the like.

> 
> > +static int verify_repo_format(struct repository_format *format,
> > + const char *commondir)
> > +{
> > +   int ret = 0;
> > +   struct strbuf sb = STRBUF_INIT;
> > +
> > +   strbuf_addf(, "%s/config", commondir);
> > +   read_repository_format(format, sb.buf);
> > +   strbuf_reset();
> > +
> > +   if (verify_repository_format(format, ) < 0) {
> > +   warning("%s", sb.buf);
> > +   ret = -1;
> > +   }
> > +
> > +   strbuf_release();
> > +   return ret;
> > +}
> 
> This function is confusingly named - firstly, there is already a
> verify_repository_format(), and secondly, this function both reads and
> verifies it.

This one is static to the file, and i don't think its named confusingly
as it does just what it says it does. I'm trying to limit how much other
code I change so I had to make this function.

> 
> > +void repo_clear(struct repository *repo)
> > +{
> > +   free(repo->gitdir);
> > +   repo->gitdir = NULL;
> > +   free(repo->commondir);
> > +   repo->commondir = NULL;
> > +   free(repo->objectdir);
> > +   repo->objectdir = NULL;
> > +   free(repo->graft_file);
> > +   repo->graft_file = NULL;
> > +   free(repo->index_file);
> > +   repo->index_file = NULL;
> > +   free(repo->worktree);
> > +   repo->worktree = NULL;
> > +
> > +   memset(repo, 0, sizeof(*repo));
> > +}
> 
> If you're going to memset, you probably don't need to set everything to
> NULL.
> 
> > +   /* Configurations */
> > +   /*
> > +* Bit used during initialization to indicate if repository state (like
> > +* the location of the 'objectdir') should be read from the
> > +* environment.  By default this bit will be set at the begining of
> > +* 'repo_init()' so that all repositories will ignore the environment.
> > +* The exception to this is 'the_repository', which doesn't go through
> > +* the normal 'repo_init()' process.
> > +*/
> > +   unsigned ignore_env:1;
> 
> If this is only used during initialization, could this be passed as a
> separate parameter internally instead of having it here?

It would feel a little wonky to be a parameter as 'the_repository' is
initialized differently than a repository would normally be.  Theres
been a lot of cleanup to the setup logic but it would need to be cleaned
up even more to have this be a param.

-- 
Brandon Williams


Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Jonathan Tan
On Tue, 20 Jun 2017 12:19:35 -0700
Brandon Williams  wrote:

> Introduce the repository object 'struct repository' which can be used to
> hold all state pertaining to a git repository.
> 
> Some of the benefits of object-ifying a repository are:
> 
>   1. Make the code base more readable and easier to reason about.
> 
>   2. Allow for working on multiple repositories, specifically
>  submodules, within the same process.  Currently the process for
>  working on a submodule involves setting up an argv_array of options
>  for a particular command and then launching a child process to
>  execute the command in the context of the submodule.  This is
>  clunky and can require lots of little hacks in order to ensure
>  correctness.  Ideally it would be nice to simply pass a repository
>  and an options struct to a command.
> 
>   3. Eliminating reliance on global state will make it easier to
>  enable the use of threading to improve performance.

These would indeed be nice to have.

> +/* called after setting gitdir */
> +static void repo_setup_env(struct repository *repo)
> +{
> + struct strbuf sb = STRBUF_INIT;
> +
> + if (!repo->gitdir)
> + BUG("gitdir wasn't set before setting up the environment");
> +
> + repo->different_commondir = find_common_dir(, repo->gitdir,
> + !repo->ignore_env);
> + repo->commondir = strbuf_detach(, NULL);
> + repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
> + "objects", !repo->ignore_env);
> + repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
> +  "info/grafts", !repo->ignore_env);
> + repo->index_file = git_path_from_env(INDEX_ENVIRONMENT, repo->gitdir,
> +  "index", !repo->ignore_env);
> +}

It seems that this function is only used once in repo_set_gitdir() -
could this be inlined there instead? Then you wouldn't need the comment
and the !repo->gitdir check.

> +static int verify_repo_format(struct repository_format *format,
> +   const char *commondir)
> +{
> + int ret = 0;
> + struct strbuf sb = STRBUF_INIT;
> +
> + strbuf_addf(, "%s/config", commondir);
> + read_repository_format(format, sb.buf);
> + strbuf_reset();
> +
> + if (verify_repository_format(format, ) < 0) {
> + warning("%s", sb.buf);
> + ret = -1;
> + }
> +
> + strbuf_release();
> + return ret;
> +}

This function is confusingly named - firstly, there is already a
verify_repository_format(), and secondly, this function both reads and
verifies it.

> +void repo_clear(struct repository *repo)
> +{
> + free(repo->gitdir);
> + repo->gitdir = NULL;
> + free(repo->commondir);
> + repo->commondir = NULL;
> + free(repo->objectdir);
> + repo->objectdir = NULL;
> + free(repo->graft_file);
> + repo->graft_file = NULL;
> + free(repo->index_file);
> + repo->index_file = NULL;
> + free(repo->worktree);
> + repo->worktree = NULL;
> +
> + memset(repo, 0, sizeof(*repo));
> +}

If you're going to memset, you probably don't need to set everything to
NULL.

> + /* Configurations */
> + /*
> +  * Bit used during initialization to indicate if repository state (like
> +  * the location of the 'objectdir') should be read from the
> +  * environment.  By default this bit will be set at the begining of
> +  * 'repo_init()' so that all repositories will ignore the environment.
> +  * The exception to this is 'the_repository', which doesn't go through
> +  * the normal 'repo_init()' process.
> +  */
> + unsigned ignore_env:1;

If this is only used during initialization, could this be passed as a
separate parameter internally instead of having it here?


Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 12:19 PM, Brandon Williams  wrote:
> Introduce the repository object 'struct repository' which can be used to
> hold all state pertaining to a git repository.
>
> Some of the benefits of object-ifying a repository are:
>
>   1. Make the code base more readable and easier to reason about.
>
>   2. Allow for working on multiple repositories, specifically
>  submodules, within the same process.  Currently the process for
>  working on a submodule involves setting up an argv_array of options
>  for a particular command and then launching a child process to
>  execute the command in the context of the submodule.  This is
>  clunky and can require lots of little hacks in order to ensure
>  correctness.  Ideally it would be nice to simply pass a repository
>  and an options struct to a command.
>
>   3. Eliminating reliance on global state will make it easier to
>  enable the use of threading to improve performance.
>
> Signed-off-by: Brandon Williams 

> +/*
> + * Initialize 'repo' based on the provided 'gitdir'.
> + * Return 0 upon success and a non-zero value upon failure.

Non zero or negative? The point of the question is if we want to
ask users of this function to be cautious early on. So in the future,
do we want to rather see

if (repo_init(...))
die("you're doomed");

or rather

int x = repo_init(...);
if (x < 0)
die("you're doomed");
else if (x == 1)
warning("you're not doomed, but close."\
 "Not distimming the gostaks.")
else
; /* we're fine, carry on with life */

I guess we can still refactor later, it's just one
thing to thing about when introducing an API
that will likely be used a lot down the road.

> +struct repository {
> +   /* Environment */
> +   /* Path to the git directory */
> +   char *gitdir;
> +
> +   /* Path to the common git directory */
> +   char *commondir;
> +
> +   /* Path to the repository's object store */
> +   char *objectdir;
> +
> +   /* Path to the repository's graft file */
> +   char *graft_file;
> +
> +   /* Path to the current worktree's index file */
> +   char *index_file;
> +
> +   /* Path to the working directory */
> +   char *worktree;
> +
> +   /* Configurations */
> +   /*
> +* Bit used during initialization to indicate if repository state 
> (like
> +* the location of the 'objectdir') should be read from the
> +* environment.  By default this bit will be set at the begining of
> +* 'repo_init()' so that all repositories will ignore the environment.
> +* The exception to this is 'the_repository', which doesn't go through
> +* the normal 'repo_init()' process.
> +*/
> +   unsigned ignore_env:1;
> +
> +   /* Indicate if a repository has a different 'commondir' from 'gitdir' 
> */
> +   unsigned different_commondir:1;
> +};

I applaud the effort towards documenting what each variable is
supposed to contain. But some of them read like

/* increments i by one */
i++;

which is considered bad comment style (it doesn't add
more information, it just wastes a line), so specifically for
all the "Path to X" comments:
* Are they absolute path, or relative path?
  If relative, then relative to what?
* Can they be NULL? When?

(* Why do we need so many path?
Could one of them be constructed using
another and then hardcoding a string relative to it?
This question may rather be answered in the commit
message)


[PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Brandon Williams
Introduce the repository object 'struct repository' which can be used to
hold all state pertaining to a git repository.

Some of the benefits of object-ifying a repository are:

  1. Make the code base more readable and easier to reason about.

  2. Allow for working on multiple repositories, specifically
 submodules, within the same process.  Currently the process for
 working on a submodule involves setting up an argv_array of options
 for a particular command and then launching a child process to
 execute the command in the context of the submodule.  This is
 clunky and can require lots of little hacks in order to ensure
 correctness.  Ideally it would be nice to simply pass a repository
 and an options struct to a command.

  3. Eliminating reliance on global state will make it easier to
 enable the use of threading to improve performance.

Signed-off-by: Brandon Williams 
---
 Makefile |   1 +
 repository.c | 172 +++
 repository.h |  46 
 3 files changed, 219 insertions(+)
 create mode 100644 repository.c
 create mode 100644 repository.h

diff --git a/Makefile b/Makefile
index f48480163..32e4efc71 100644
--- a/Makefile
+++ b/Makefile
@@ -839,6 +839,7 @@ LIB_OBJS += refs/ref-cache.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += remote.o
 LIB_OBJS += replace_object.o
+LIB_OBJS += repository.o
 LIB_OBJS += rerere.o
 LIB_OBJS += resolve-undo.o
 LIB_OBJS += revision.o
diff --git a/repository.c b/repository.c
new file mode 100644
index 0..7c8a7728c
--- /dev/null
+++ b/repository.c
@@ -0,0 +1,172 @@
+#include "cache.h"
+#include "repository.h"
+
+/* The main repository */
+static struct repository the_repo;
+struct repository *the_repository = _repo;
+
+static char *git_path_from_env(const char *envvar, const char *git_dir,
+  const char *path, int fromenv)
+{
+   if (fromenv) {
+   const char *value = getenv(envvar);
+   if (value)
+   return xstrdup(value);
+   }
+
+   return xstrfmt("%s/%s", git_dir, path);
+}
+
+static int find_common_dir(struct strbuf *sb, const char *gitdir, int fromenv)
+{
+   if (fromenv) {
+   const char *value = getenv(GIT_COMMON_DIR_ENVIRONMENT);
+   if (value) {
+   strbuf_addstr(sb, value);
+   return 1;
+   }
+   }
+
+   return get_common_dir_noenv(sb, gitdir);
+}
+
+/* called after setting gitdir */
+static void repo_setup_env(struct repository *repo)
+{
+   struct strbuf sb = STRBUF_INIT;
+
+   if (!repo->gitdir)
+   BUG("gitdir wasn't set before setting up the environment");
+
+   repo->different_commondir = find_common_dir(, repo->gitdir,
+   !repo->ignore_env);
+   repo->commondir = strbuf_detach(, NULL);
+   repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
+   "objects", !repo->ignore_env);
+   repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
+"info/grafts", !repo->ignore_env);
+   repo->index_file = git_path_from_env(INDEX_ENVIRONMENT, repo->gitdir,
+"index", !repo->ignore_env);
+}
+
+void repo_set_gitdir(struct repository *repo, const char *path)
+{
+   const char *gitfile = read_gitfile(path);
+
+   /*
+* NEEDSWORK: Eventually we want to be able to free gitdir and the rest
+* of the environment before reinitializing it again, but we have some
+* crazy code paths where we try to set gitdir with the current gitdir
+* and we don't want to free gitdir before copying the passed in value.
+*/
+   repo->gitdir = xstrdup(gitfile ? gitfile : path);
+
+   repo_setup_env(repo);
+}
+
+/*
+ * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
+ * Return 0 upon success and a non-zero value upon failure.
+ */
+static int repo_init_gitdir(struct repository *repo, const char *gitdir)
+{
+   int ret = 0;
+   int error = 0;
+   char *abspath = NULL;
+   char *suspect = NULL;
+   const char *resolved_gitdir;
+
+   abspath = real_pathdup(gitdir, 0);
+   if (!abspath) {
+   ret = -1;
+   goto out;
+   }
+
+   /* First assume 'gitdir' references the gitdir directly */
+   resolved_gitdir = resolve_gitdir_gently(abspath, );
+   /* otherwise; try 'gitdir'.git */
+   if (!resolved_gitdir) {
+   suspect = xstrfmt("%s/.git", abspath);
+   resolved_gitdir = resolve_gitdir_gently(suspect, );
+   if (!resolved_gitdir) {
+   ret = -1;
+   goto out;
+   }
+   }
+
+   repo_set_gitdir(repo, resolved_gitdir);
+