Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-26 Thread Paul Tan
On Thu, Jun 25, 2015 at 12:36 AM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 On 2015-06-18 13:25, Paul Tan wrote:
 diff --git a/builtin/am.c b/builtin/am.c
 index 7b97ea8..d6434e4 100644
 --- a/builtin/am.c
 +++ b/builtin/am.c
 @@ -94,6 +126,105 @@ static int read_state_file(struct strbuf *sb,
 const char *file, size_t hint, int
  }

  /**
 + * Reads a KEY=VALUE shell variable assignment from fp, and returns the 
 VALUE
 + * in `value`. VALUE must be a quoted string, and the KEY must match `key`.
 + * Returns 0 on success, -1 on failure.
 + *
 + * This is used by read_author_script() to read the GIT_AUTHOR_* variables 
 from
 + * the author-script.
 + */
 +static int read_shell_var(struct strbuf *value, FILE *fp, const char *key)
 +{
 + struct strbuf sb = STRBUF_INIT;
 + char *str;
 +
 + if (strbuf_getline(sb, fp, '\n'))
 + return -1;
 +
 + if (!skip_prefix(sb.buf, key, (const char **)str))
 + return -1;
 +
 + if (!skip_prefix(str, =, (const char **)str))
 + return -1;
 +
 + str = sq_dequote(str);
 + if (!str)
 + return -1;
 +
 + strbuf_reset(value);
 + strbuf_addstr(value, str);
 +
 + strbuf_release(sb);
 +
 + return 0;
 +}

 How about using `strbuf_remove()` and keeping `str` as `const char *`?

OK, I'll try that out. Looks like this now:

static char *read_shell_var(FILE *fp, const char *key)
{
struct strbuf sb = STRBUF_INIT;
const char *str;

if (strbuf_getline(sb, fp, '\n'))
return NULL;

if (!skip_prefix(sb.buf, key, str))
return NULL;

if (!skip_prefix(str, =, str))
return NULL;

strbuf_remove(sb, 0, str - sb.buf);

str = sq_dequote(sb.buf);
if (!str)
return NULL;

return strbuf_detach(sb, NULL);
}

 I also think we can fold it into the `read_author_script()` function and make 
 it more resilient with regards to the order of the variables. Something like 
 this:
 [...]

Hmm, I think we should be very strict about parsing the author-script
file though. As explained in read_author_script(), git-am.sh evals the
author-script, which we can't in C. I would much rather we barf at the
first sign that the author-script is not what we expect, rather than
attempt to parse it as much as possible, but end up with the wrong
results as compared to git-am.sh.

Besides, currently git-am.sh will always write the author-script with
the order of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and GIT_AUTHOR_DATE. If
the order is wrong, I would think it means that something is messing
with the author-script, and it would be better if we bail out
immediately, instead of potentially doing something wrong.

 +/**
 + * Saves state-author_name, state-author_email and state-author_date in
 + * `filename` as an author script, which is the format used by git-am.sh.
 + */
 +static void write_author_script(const struct am_state *state)
 +{
 + static const char fmt[] = GIT_AUTHOR_NAME=%s\n
 + GIT_AUTHOR_EMAIL=%s\n
 + GIT_AUTHOR_DATE=%s\n;
 + struct strbuf author_name = STRBUF_INIT;
 + struct strbuf author_email = STRBUF_INIT;
 + struct strbuf author_date = STRBUF_INIT;
 +
 + sq_quote_buf(author_name, state-author_name.buf);
 + sq_quote_buf(author_email, state-author_email.buf);
 + sq_quote_buf(author_date, state-author_date.buf);

 The `sq_quote_buf()` function does not call `strbuf_reset()`. Therefore you 
 could just use a single strbuf to construct the entire three lines and then 
 write that out. Again, if you follow my suggestion to keep a scratch pad 
 strbuf in am_state, you could reuse that.

Right, makes sense. I've implemented it on my end.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-26 Thread Junio C Hamano
Paul Tan pyoka...@gmail.com writes:

 OK, I'll try that out. Looks like this now:

 static char *read_shell_var(FILE *fp, const char *key)
 {
 ...
 str = sq_dequote(sb.buf);
 if (!str)
 return NULL;

You are unlikely to get !str, but if it does, you leak sb here,
don't you?

 return strbuf_detach(sb, NULL);

This call is OK; if you passed the length to detach, you're likely
to get a wrong result, though ;-)

sq_dequote() is one of the older parts of the API set and its we
know it cannot do anything but shrink, so we'd do it in-place
attitude, which may be vastly useful in practice, is now showing
some impedance mismatch with newer parts of the API like strbuf.

 +/**
 + * Saves state-author_name, state-author_email and state-author_date in
 + * `filename` as an author script, which is the format used by git-am.sh.
 + */
 +static void write_author_script(const struct am_state *state)
 +{
 + static const char fmt[] = GIT_AUTHOR_NAME=%s\n
 + GIT_AUTHOR_EMAIL=%s\n
 + GIT_AUTHOR_DATE=%s\n;
 + struct strbuf author_name = STRBUF_INIT;
 + struct strbuf author_email = STRBUF_INIT;
 + struct strbuf author_date = STRBUF_INIT;
 +
 + sq_quote_buf(author_name, state-author_name.buf);
 + sq_quote_buf(author_email, state-author_email.buf);
 + sq_quote_buf(author_date, state-author_date.buf);

 The `sq_quote_buf()` function does not call
 strbuf_reset()`. Therefore you could just use a single strbuf to
 construct the entire three lines and then write that out.

Yup.  quote appends to the output, so you could do this:

add(out, GIT_AUTHOR_NAME=);
quote(out, state-author_name);
add(out, \\nGIT_AUTHOR_EMAIL=);
quote(out, state-author_email);
...

I am not sure if that is easier to read than what you have, though.

 Again, if
 you follow my suggestion to keep a scratch pad strbuf in am_state,
 you could reuse that.

Don't do scratch pad in a structure that is passed around to
various people.  Somebody may be tempted to use the scratch pad
while he has the control, but as soon as he becomes complex enough
to require calling some helper functions, the ownership rules of the
scratch pad will become cumbersome to manage and understandable only
by the person who originally wrote the codepath.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-25 Thread Paul Tan
On Wed, Jun 24, 2015 at 11:59 PM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 3. I'm over-thinking this and you just want the struct strbufs in the
struct am_state to be switched to char*s?

 Yes, everybody interacts with am_state, and these fields are
 supposed to be constant during the processing of each patch input
 message; they should be simple strings, not strbufs, to make sure if
 anybody _does_ muck with them in-place, that would be very visible.
 The helpers to initialize them are free to use strbuf API to prepare
 these simple string fields, of course.

OK. I've implemented it on my end.

 (On a somewhat related thought, currently we do write_file() all over
 the place, which is really ugly. I'm leaning heavily on introducing an
 am_save() function, for I don't care how it is done but just update the
 contents of the am state directory so that it matches the contents of
 the struct am_state.

 Sure; the scripted Porcelain may have done echo here, echo there
 instead of concatenate into a $var and then 'echo $var' at end as
 that is more natural way to program in that environment.  You are
 doing this in C and prepare the thing in-core and write it all at
 the point to snapshot may well be the more natural way to program.

 As long as a process that stops in the middle does not leave on-disk
 state inconsistent, batching would be fine.  For example, you may
 apply and commit two (or more) patches without updating the on-disk
 state as you do not see need to give control back to the user
 (i.e. they applied cleanly) and then write out the on-disk state
 with .next incremented by two (or more) before giving the control
 back could be a valid optimization (take this example with a grain
 of salt, though; I haven't thought too deeply about what should
 happen if you Ctrl-C the process in the middle).

Right, I briefly wondered if we could hold off writing the am_state to
disk as much as possible, and only write to disk when we are about to
terminate. This would involve installing an atexit() and SIGTERM
signal handler, but I haven't thought too deeply about that.

Anyway, moving all the writing am_state to disk logic to a central
function am_save() would be a good immediate step, I think, so I've
implemented it on my end.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-24 Thread Paul Tan
Okay, let's focus only on the API design issues.

For the record, I'm not completely satisfied with the current code
organization and API, however I don't have really good ideas at hand to
improve it, so any ideas with examples would be appreciated.

On Fri, Jun 19, 2015 at 09:13:00AM -0700, Junio C Hamano wrote:
 Paul Tan pyoka...@gmail.com writes:
 With the above fields, it is clear that the above fields are
 per-message thing.  So the loop to process multiple messages is
 conceptually:
 
 
   set up the entire am_state (for things like cur=1, last=N)
 for each message {
   update per-message fields like cur, author, msg, etc.
   process the single message
 clean up per-message fileds like cur++, free(msg), etc.
   }
   tear down the entire am_state
 
 Reusing the same strbuf and rely on them to clean up after
 themselves is simply a bad taste.

Hmm, reading reusing the same strbuf makes me wonder if I'm
misunderstanding something. Do you mean that:

1. We should release the memory in state-msg, state-author_* after
   processing each message? Because that is what am_next() already does.
   Granted, it does strbuf_reset(), but it could just as well be
   strbuf_release().

or

2. The per-message fields (state-msg, state-author_*) should become
   local variables in am_run()?

or

3. I'm over-thinking this and you just want the struct strbufs in the
   struct am_state to be switched to char*s? If that is so, I've
   attached a sample patch below.

For (2), my idea is that am_state represents the contents of the state
directory at any point in time -- a design feature carried over from
git-am.sh.  This is why parse_patch() modifies the am_state struct
directly -- the msg and author_* fields would be written to the
final-commit and author-script files directly after.

I think it would be confusing if we don't update the am_state struct
directly, as we will never know if the am_state struct contains the
actual current state of the patch application session.

(On a somewhat related thought, currently we do write_file() all over
the place, which is really ugly. I'm leaning heavily on introducing an
am_save() function, for I don't care how it is done but just update the
contents of the am state directory so that it matches the contents of
the struct am_state. To save pointless writes, we can compare the
am_state with the last-written/loaded am_state, so we only write the
files that have changed. What do others think?)

 More importantly, we'd want to make sure that people who will be
 touching the process the single message part in the above loop to
 think twice before mucking with read-only fields like author.

Okay, I understand and agree with your reasoning.

However, note that this process the single message part consists
solely of run_apply() and do_commit(). Both of them take a
const struct am_state*, which means that the compiler will warn/fail
if anybody tries to touch any of the fields in the am_state. This
extends to strbufs as well.  Isn't this already safe enough? Or do you
want this safety to extend throughout am_run() as well?

 If they are char *, they would need to allocate new storage
 themselves, format a new value into there, free the original, and
 replace the field with the new value.  It takes a conscious effort
 and very much visible code and would be clear to reviewers what is
 going on, and gives them a chance to question if it is a good idea
 to update things in-place in the first place.

Okay, although personally I think reviewers may tend to miss out that a
single line like:

state-msg = strbuf_detach(sb, NULL);

introduces a memory leak.

 If you left it in strbuf, that invites let's extend it temporarily
 with strbuf_add() and then return to the original once I am done
 with this single step, which is an entry to a slippery slope to
 let's extend it with strbuf_add() for my purpose, and I do not even
 bother to clean up because I know I am the last person to touch
 this.
 
 So, no, please don't leave a field that won't change during the bulk
 of the processing in strbuf, unless you have a compelling reason to
 do so (and compelling reasons are pretty much limited to after
 all, that field we originally thought it won't change is something
 we need to change very often).

Anyway, assuming that you just want the strbufs in the am_state struct
to be switched to char*s, here's a quick diff of the result. Can I
confirm that this is the direction that you want to take? (since the
changes are throughout the entire patch series)

Regards,
Paul

--- 8 ---
 builtin/am.c | 202 ++-
 1 file changed, 131 insertions(+), 71 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index f55dd14..944e35a 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -98,10 +98,10 @@ struct am_state {
int last;
 
/* commit message and metadata */
-   struct strbuf author_name;
- 

Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-24 Thread Junio C Hamano
Paul Tan pyoka...@gmail.com writes:

 3. I'm over-thinking this and you just want the struct strbufs in the
struct am_state to be switched to char*s?

Yes, everybody interacts with am_state, and these fields are
supposed to be constant during the processing of each patch input
message; they should be simple strings, not strbufs, to make sure if
anybody _does_ muck with them in-place, that would be very visible.
The helpers to initialize them are free to use strbuf API to prepare
these simple string fields, of course.

 (On a somewhat related thought, currently we do write_file() all over
 the place, which is really ugly. I'm leaning heavily on introducing an
 am_save() function, for I don't care how it is done but just update the
 contents of the am state directory so that it matches the contents of
 the struct am_state.

Sure; the scripted Porcelain may have done echo here, echo there
instead of concatenate into a $var and then 'echo $var' at end as
that is more natural way to program in that environment.  You are
doing this in C and prepare the thing in-core and write it all at
the point to snapshot may well be the more natural way to program.

As long as a process that stops in the middle does not leave on-disk
state inconsistent, batching would be fine.  For example, you may
apply and commit two (or more) patches without updating the on-disk
state as you do not see need to give control back to the user
(i.e. they applied cleanly) and then write out the on-disk state
with .next incremented by two (or more) before giving the control
back could be a valid optimization (take this example with a grain
of salt, though; I haven't thought too deeply about what should
happen if you Ctrl-C the process in the middle).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-24 Thread Johannes Schindelin
Hi Paul,

On 2015-06-18 13:25, Paul Tan wrote:

 diff --git a/builtin/am.c b/builtin/am.c
 index 7b97ea8..d6434e4 100644
 --- a/builtin/am.c
 +++ b/builtin/am.c
 @@ -94,6 +126,105 @@ static int read_state_file(struct strbuf *sb,
 const char *file, size_t hint, int
  }
  
  /**
 + * Reads a KEY=VALUE shell variable assignment from fp, and returns the VALUE
 + * in `value`. VALUE must be a quoted string, and the KEY must match `key`.
 + * Returns 0 on success, -1 on failure.
 + *
 + * This is used by read_author_script() to read the GIT_AUTHOR_* variables 
 from
 + * the author-script.
 + */
 +static int read_shell_var(struct strbuf *value, FILE *fp, const char *key)
 +{
 + struct strbuf sb = STRBUF_INIT;
 + char *str;
 +
 + if (strbuf_getline(sb, fp, '\n'))
 + return -1;
 +
 + if (!skip_prefix(sb.buf, key, (const char **)str))
 + return -1;
 +
 + if (!skip_prefix(str, =, (const char **)str))
 + return -1;
 +
 + str = sq_dequote(str);
 + if (!str)
 + return -1;
 +
 + strbuf_reset(value);
 + strbuf_addstr(value, str);
 +
 + strbuf_release(sb);
 +
 + return 0;
 +}

How about using `strbuf_remove()` and keeping `str` as `const char *`? I also 
think we can fold it into the `read_author_script()` function and make it more 
resilient with regards to the order of the variables. Something like this:

static int read_author_script(struct am_state *state)
{
struct strbuf sb = STRBUF_INIT;
const char *filename = am_path(state, author-script);
FILE *fp = fopen(filename, r);
if (!fp) {
if (errno == ENOENT)
return 0;
die_errno(_(could not open '%s' for reading), filename);
}

while (!strbuf_getline(sb, fp, '\n')) {
char *equal = strchr(sb.buf, '='), **var;

if (!equal) {
error:
fclose(fp);
return -1;
}
*equal = '\0';
if (!strcmp(sb.buf, GIT_AUTHOR_NAME))
var = state-author_name;
else if (!strcmp(sb.buf, GIT_AUTHOR_EMAIL))
var = state-author_email;
else if (!strcmp(sb.buf, GIT_AUTHOR_DATE))
var = state-author_date;
else
goto error;
*var = xstrdup(sq_dequote(equal + 1));
}

fclose(fp);
return -1;
}

If you follow my earlier suggestion to keep a strbuf inside the am_state, you 
could reuse that here, too.

 +/**
 + * Saves state-author_name, state-author_email and state-author_date in
 + * `filename` as an author script, which is the format used by git-am.sh.
 + */
 +static void write_author_script(const struct am_state *state)
 +{
 + static const char fmt[] = GIT_AUTHOR_NAME=%s\n
 + GIT_AUTHOR_EMAIL=%s\n
 + GIT_AUTHOR_DATE=%s\n;
 + struct strbuf author_name = STRBUF_INIT;
 + struct strbuf author_email = STRBUF_INIT;
 + struct strbuf author_date = STRBUF_INIT;
 +
 + sq_quote_buf(author_name, state-author_name.buf);
 + sq_quote_buf(author_email, state-author_email.buf);
 + sq_quote_buf(author_date, state-author_date.buf);

The `sq_quote_buf()` function does not call `strbuf_reset()`. Therefore you 
could just use a single strbuf to construct the entire three lines and then 
write that out. Again, if you follow my suggestion to keep a scratch pad 
strbuf in am_state, you could reuse that.

That scratch pad could come in handy in a couple of other places in the rest of 
this patch.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Johannes Schindelin
Hi Paul,

On 2015-06-19 18:15, Paul Tan wrote:
 On Fri, Jun 19, 2015 at 11:09 PM, Junio C Hamano gits...@pobox.com wrote:
 
 The primary thing I care about is to discourage callers of the API element
 am_state from touching these fields with strbuf functions. If it is char * 
 then
 the would think twice before modifying (which would involve allocating the
 new buffer, forming the new value and assigning into it). With the fields 
 left
 as strbuf after they are read or formed by the API (by reading by the API
 implementation from $GIT_DIR/rebase-apply/), it is too tempting to do
 strbuf_add(), strbuf_trim(), etc., without restoring the value to the 
 original
 saying I'm the last user of it anyway--that is the sloppiness we can 
 prevent
 by not leaving it as strbuf.
 
 I don't think this is a good deterrent. If the code wanted to, it
 could just use strbuf_attach()/strbuf_detach() as well, no?

Sadly, I am a bit too busy with some loose Git for Windows ends, so I haven't 
had the chance to look at your code.

But I still wanted to throw this idea out there: would it maybe possible to 
avoid having those values as global variables, and instead pass them as const 
char * parameters to the respective functions? That should help resolve the 
concerns of both sides because it would allow us to keep the strbuf instances, 
just as local variables.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 5:10 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 + /* commit message and metadata */
 + struct strbuf author_name;
 + struct strbuf author_email;
 + struct strbuf author_date;
 + struct strbuf msg;

 Same comment as dir in the earlier patch applies to these.  If the
 fields are read or computed and then kept constant, use a temporary
 variable that is a strbuf to read/compute the final value, and then
 detach to a const char * field.  If they are constantly changing
 and in-place updates are vital, then they can and should be strbufs,
 but I do not think that is the case for these.

I do think it is the case here. The commit message and metadata fields
change for every patch we process, and we could be processing a large
volume of patches, so we must be careful to not leak any memory.

We could use a char* field, but then to prevent memory leaks we would
then have to free() it and malloc() a new string for every patch we
process, which will lead to lots of malloc()s and free()s over a large
volume of patches that can lead to memory fragmentation.

Regards,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 11:09 PM, Junio C Hamano gits...@pobox.com wrote:
 You do realize that strbuf internally does alloc/free so as a solution to
 fragmentation issue you are at the mercy of the same alloc/free, don't you?

Yes, of course, but it has the alloc variable to keep track of the
size of the allocated buffer, and provided that ALLOC_GROW() uses a
sensible factor to grow the buffer, it won't be calling malloc/free
that much to be a problem.

Of course, we could do the same and just add a alloc variable as
well and then use it with strbuf_attach()/strbuf_detach(), but at this
point why not just store it in an strbuf then and avoid the extra
alloc struct member?

 The primary thing I care about is to discourage callers of the API element
 am_state from touching these fields with strbuf functions. If it is char * 
 then
 the would think twice before modifying (which would involve allocating the
 new buffer, forming the new value and assigning into it). With the fields left
 as strbuf after they are read or formed by the API (by reading by the API
 implementation from $GIT_DIR/rebase-apply/), it is too tempting to do
 strbuf_add(), strbuf_trim(), etc., without restoring the value to the original
 saying I'm the last user of it anyway--that is the sloppiness we can prevent
 by not leaving it as strbuf.

I don't think this is a good deterrent. If the code wanted to, it
could just use strbuf_attach()/strbuf_detach() as well, no?

I believe this kind of issue would be better solved with a big WARNING
comment and code review.

Also, there may be use cases where the user may wish to modify the
commit message/author fields. E.g., user may wish to modify the
message of the commit that results from am --continue to take into
account the conflicts that caused the am to fail.

Regards,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Junio C Hamano
Paul Tan pyoka...@gmail.com writes:

 On Fri, Jun 19, 2015 at 5:10 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 + /* commit message and metadata */
 + struct strbuf author_name;
 + struct strbuf author_email;
 + struct strbuf author_date;
 + struct strbuf msg;

 Same comment as dir in the earlier patch applies to these.  If the
 fields are read or computed and then kept constant, use a temporary
 variable that is a strbuf to read/compute the final value, and then
 detach to a const char * field.  If they are constantly changing
 and in-place updates are vital, then they can and should be strbufs,
 but I do not think that is the case for these.

 I do think it is the case here. The commit message and metadata fields
 change for every patch we process, and we could be processing a large
 volume of patches, so we must be careful to not leak any memory.

With the above fields, it is clear that the above fields are
per-message thing.  So the loop to process multiple messages is
conceptually:


set up the entire am_state (for things like cur=1, last=N)
for each message {
update per-message fields like cur, author, msg, etc.
process the single message
clean up per-message fileds like cur++, free(msg), etc.
}
tear down the entire am_state

Reusing the same strbuf and rely on them to clean up after
themselves is simply a bad taste.

More importantly, we'd want to make sure that people who will be
touching the process the single message part in the above loop to
think twice before mucking with read-only fields like author.

If they are char *, they would need to allocate new storage
themselves, format a new value into there, free the original, and
replace the field with the new value.  It takes a conscious effort
and very much visible code and would be clear to reviewers what is
going on, and gives them a chance to question if it is a good idea
to update things in-place in the first place.

If you left it in strbuf, that invites let's extend it temporarily
with strbuf_add() and then return to the original once I am done
with this single step, which is an entry to a slippery slope to
let's extend it with strbuf_add() for my purpose, and I do not even
bother to clean up because I know I am the last person to touch
this.

So, no, please don't leave a field that won't change during the bulk
of the processing in strbuf, unless you have a compelling reason to
do so (and compelling reasons are pretty much limited to after
all, that field we originally thought it won't change is something
we need to change very often).

Thanks.


--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-18 Thread Paul Tan
For the purpose of applying the patch and committing the results,
implement extracting the patch data, commit message and authorship from
an e-mail message using git-mailinfo.

git-mailinfo is run as a separate process, but ideally in the future,
we should be be able to access its functionality directly without
spawning a new process.

Helped-by: Junio C Hamano gits...@pobox.com
Helped-by: Jeff King p...@peff.net
Signed-off-by: Paul Tan pyoka...@gmail.com
---
Notes:
v3

* Style fixes

 builtin/am.c | 232 +++
 1 file changed, 232 insertions(+)

diff --git a/builtin/am.c b/builtin/am.c
index 7b97ea8..d6434e4 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -9,6 +9,23 @@
 #include parse-options.h
 #include dir.h
 #include run-command.h
+#include quote.h
+
+/**
+ * Returns 1 if the file is empty or does not exist, 0 otherwise.
+ */
+static int is_empty_file(const char *filename)
+{
+   struct stat st;
+
+   if (stat(filename, st)  0) {
+   if (errno == ENOENT)
+   return 1;
+   die_errno(_(could not stat %s), filename);
+   }
+
+   return !st.st_size;
+}
 
 enum patch_format {
PATCH_FORMAT_UNKNOWN = 0,
@@ -23,6 +40,12 @@ struct am_state {
int cur;
int last;
 
+   /* commit message and metadata */
+   struct strbuf author_name;
+   struct strbuf author_email;
+   struct strbuf author_date;
+   struct strbuf msg;
+
/* number of digits in patch filename */
int prec;
 };
@@ -36,6 +59,11 @@ static void am_state_init(struct am_state *state)
 
strbuf_init(state-dir, 0);
 
+   strbuf_init(state-author_name, 0);
+   strbuf_init(state-author_email, 0);
+   strbuf_init(state-author_date, 0);
+   strbuf_init(state-msg, 0);
+
state-prec = 4;
 }
 
@@ -45,6 +73,10 @@ static void am_state_init(struct am_state *state)
 static void am_state_release(struct am_state *state)
 {
strbuf_release(state-dir);
+   strbuf_release(state-author_name);
+   strbuf_release(state-author_email);
+   strbuf_release(state-author_date);
+   strbuf_release(state-msg);
 }
 
 /**
@@ -94,6 +126,105 @@ static int read_state_file(struct strbuf *sb, const char 
*file, size_t hint, int
 }
 
 /**
+ * Reads a KEY=VALUE shell variable assignment from fp, and returns the VALUE
+ * in `value`. VALUE must be a quoted string, and the KEY must match `key`.
+ * Returns 0 on success, -1 on failure.
+ *
+ * This is used by read_author_script() to read the GIT_AUTHOR_* variables from
+ * the author-script.
+ */
+static int read_shell_var(struct strbuf *value, FILE *fp, const char *key)
+{
+   struct strbuf sb = STRBUF_INIT;
+   char *str;
+
+   if (strbuf_getline(sb, fp, '\n'))
+   return -1;
+
+   if (!skip_prefix(sb.buf, key, (const char **)str))
+   return -1;
+
+   if (!skip_prefix(str, =, (const char **)str))
+   return -1;
+
+   str = sq_dequote(str);
+   if (!str)
+   return -1;
+
+   strbuf_reset(value);
+   strbuf_addstr(value, str);
+
+   strbuf_release(sb);
+
+   return 0;
+}
+
+/**
+ * Parses the author script `filename`, and sets state-author_name,
+ * state-author_email and state-author_date accordingly. We are strict with
+ * our parsing, as the author script is supposed to be eval'd, and loosely
+ * parsing it may not give the results the user expects.
+ *
+ * The author script is of the format:
+ *
+ * GIT_AUTHOR_NAME='$author_name'
+ * GIT_AUTHOR_EMAIL='$author_email'
+ * GIT_AUTHOR_DATE='$author_date'
+ *
+ * where $author_name, $author_email and $author_date are quoted.
+ */
+static int read_author_script(struct am_state *state)
+{
+   const char *filename = am_path(state, author-script);
+   FILE *fp = fopen(filename, r);
+   if (!fp) {
+   if (errno == ENOENT)
+   return 0;
+   die_errno(_(could not open '%s' for reading), filename);
+   }
+
+   if (read_shell_var(state-author_name, fp, GIT_AUTHOR_NAME))
+   return -1;
+
+   if (read_shell_var(state-author_email, fp, GIT_AUTHOR_EMAIL))
+   return -1;
+
+   if (read_shell_var(state-author_date, fp, GIT_AUTHOR_DATE))
+   return -1;
+
+   if (fgetc(fp) != EOF)
+   return -1;
+
+   fclose(fp);
+   return 0;
+}
+
+/**
+ * Saves state-author_name, state-author_email and state-author_date in
+ * `filename` as an author script, which is the format used by git-am.sh.
+ */
+static void write_author_script(const struct am_state *state)
+{
+   static const char fmt[] = GIT_AUTHOR_NAME=%s\n
+   GIT_AUTHOR_EMAIL=%s\n
+   GIT_AUTHOR_DATE=%s\n;
+   struct strbuf author_name = STRBUF_INIT;
+   struct strbuf author_email = STRBUF_INIT;
+   struct strbuf author_date = STRBUF_INIT;
+
+   

Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-18 Thread Junio C Hamano
Paul Tan pyoka...@gmail.com writes:

 + /* commit message and metadata */
 + struct strbuf author_name;
 + struct strbuf author_email;
 + struct strbuf author_date;
 + struct strbuf msg;

Same comment as dir in the earlier patch applies to these.  If the
fields are read or computed and then kept constant, use a temporary
variable that is a strbuf to read/compute the final value, and then
detach to a const char * field.  If they are constantly changing
and in-place updates are vital, then they can and should be strbufs,
but I do not think that is the case for these.

For example...

 +/**
 + * Saves state-author_name, state-author_email and state-author_date in
 + * `filename` as an author script, which is the format used by git-am.sh.
 + */
 +static void write_author_script(const struct am_state *state)
 +{
 + static const char fmt[] = GIT_AUTHOR_NAME=%s\n
 + GIT_AUTHOR_EMAIL=%s\n
 + GIT_AUTHOR_DATE=%s\n;
 + struct strbuf author_name = STRBUF_INIT;
 + struct strbuf author_email = STRBUF_INIT;
 + struct strbuf author_date = STRBUF_INIT;
 +
 + sq_quote_buf(author_name, state-author_name.buf);
 + sq_quote_buf(author_email, state-author_email.buf);
 + sq_quote_buf(author_date, state-author_date.buf);

As you use a separate author_name variable here, what gets sq-quoted
that is in *state does not have to be strbuf.

The code to read is the same story:

 +static int read_shell_var(struct strbuf *value, FILE *fp, const char *key)
 +{
 + struct strbuf sb = STRBUF_INIT;
 + char *str;
 +
 + if (strbuf_getline(sb, fp, '\n'))
 + return -1;
 +...
 + strbuf_reset(value);
 + strbuf_addstr(value, str);
 +
 + strbuf_release(sb);
 +
 + return 0;
 +}

As you use a separate sb strbuf variable here, there is no need for
value to be pointing at a strbuf; it could be char ** that sb's
contents is detached into.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html