Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-06 Thread Jeff King
On Wed, Sep 06, 2017 at 08:12:24PM +0200, Martin Ågren wrote:

> On 5 September 2017 at 23:26, Junio C Hamano  wrote:
> > Jeff King  writes:
> >
> >> I noticed the HEAD funniness, too, when looking at this earlier. I agree
> >> with Junio that it's not quite consistent with the general rule of
> >> "string list items point to their refnames", but I don't think it
> >> matters in practice.
> >
> > Yup, we are on the same page; the "fix" I was alluding to would look
> > exactly like what you wrote below, but I agree the distinction does
> > not matter in practice.  IOW, I do not think the code after Martin's
> > fix is wrong per-se.
> 
> Well, "not wrong per-se" tells me you'd feel slightly more comfortable
> about the state of things if we did this. ;)
> 
> I'll take Peff's hint, tweak/add comments for correctness and symmetry
> with the previous patch and add an if-BUG for symmetry. Peff: Do I have
> your sign-off? (Do I need it?)

Yes, you have my sign-off. Probably it is not necessary for such a
trivial patch, but it never hurts to be sure.

> If we re-roll, would you prefer Peff's much smaller take on patch 2
> (strbuf_release where it matters, instead of sprinkling "goto out" all
> over)? I think me and him agreed that we'd be fine either way. I'd reuse
> my commit message, if I get his sign-off and "From:".

You are welcome to forge my sign-off there (but I really am OK with
either approach).

-Peff


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-06 Thread Junio C Hamano
Martin Ågren  writes:

> Junio: The topic is in pu as ma/split-symref-update-fix. Re-roll or new
> topic or as a separate "patch 4/3" for you to place on top, any
> preference?

Until a topic hits 'next', you solely own it and it is mostly up to
you how to go about improving it.  My preference would be much less
relevant than what the end result would require, i.e

> If we re-roll, would you prefer Peff's much smaller take on patch 2
> (strbuf_release where it matters, instead of sprinkling "goto out" all
> over)? I think me and him agreed that we'd be fine either way. I'd reuse
> my commit message, if I get his sign-off and "From:".

... if we take Peff's approach, then rerolling the whole thing would
be the only approach that makes sense---incremental update would
make the resulting history less readable.  Between two approaches I
do not have a strong preference either way---it's a choice that can
be made between you and Peff.

> Obviously, if Michael or anyone else has any input, I'm open to that as
> well..

Sure.


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-06 Thread Martin Ågren
On 5 September 2017 at 23:26, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> I noticed the HEAD funniness, too, when looking at this earlier. I agree
>> with Junio that it's not quite consistent with the general rule of
>> "string list items point to their refnames", but I don't think it
>> matters in practice.
>
> Yup, we are on the same page; the "fix" I was alluding to would look
> exactly like what you wrote below, but I agree the distinction does
> not matter in practice.  IOW, I do not think the code after Martin's
> fix is wrong per-se.

Well, "not wrong per-se" tells me you'd feel slightly more comfortable
about the state of things if we did this. ;)

I'll take Peff's hint, tweak/add comments for correctness and symmetry
with the previous patch and add an if-BUG for symmetry. Peff: Do I have
your sign-off? (Do I need it?)

Junio: The topic is in pu as ma/split-symref-update-fix. Re-roll or new
topic or as a separate "patch 4/3" for you to place on top, any
preference?

If we re-roll, would you prefer Peff's much smaller take on patch 2
(strbuf_release where it matters, instead of sprinkling "goto out" all
over)? I think me and him agreed that we'd be fine either way. I'd reuse
my commit message, if I get his sign-off and "From:".

Obviously, if Michael or anyone else has any input, I'm open to that as
well..

Martin


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Junio C Hamano
Jeff King  writes:

> I noticed the HEAD funniness, too, when looking at this earlier. I agree
> with Junio that it's not quite consistent with the general rule of
> "string list items point to their refnames", but I don't think it
> matters in practice.

Yup, we are on the same page; the "fix" I was alluding to would look
exactly like what you wrote below, but I agree the distinction does
not matter in practice.  IOW, I do not think the code after Martin's
fix is wrong per-se.

Thanks.

> I think the fix, if we wanted to do one, would be similar to what you
> did in split_symref_update(). Like:
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index f3455609d6..3f9deff902 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2095,8 +2095,7 @@ static int split_head_update(struct ref_update *update,
>* transaction. This insertion is O(N) in the transaction
>* size, but it happens at most once per transaction.
>*/
> - item = string_list_insert(affected_refnames, "HEAD");
> - if (item->util) {
> + if (string_list_has_string(affected_refnames, "HEAD")) {
>   /* An entry already existed */
>   strbuf_addf(err,
>   "multiple updates for 'HEAD' (including one "
> @@ -2111,6 +2110,7 @@ static int split_head_update(struct ref_update *update,
>   update->new_oid.hash, update->old_oid.hash,
>   update->msg);
>  
> + item = string_list_insert(affected_refnames, new_update->refname);
>   item->util = new_update;
>  
>   return 0;
>
> -Peff


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Jeff King
On Tue, Sep 05, 2017 at 07:24:18PM +0200, Martin Ågren wrote:

> > And from that point of view, doesn't split_head_update() wants a
> > similar fix?  It attempts to insert "HEAD", makes sure it hasn't
> > been inserted and then hangs a new update transaction as its util.
> > It is not wrong per-se from purely leak-prevention point of view,
> > as that "HEAD" is a literal string we woudn't even want to free,
> > but from logical/"what each data means" point of view, it still
> > feels wrong.
> 
> There is a "Special hack" comment related to this, and I don't feel
> particularly confident that I could make any meaningful contribution in
> this area. To be honest, I don't immediately see in which direction your
> suggestion/idea/thought is going, which tells me I should not be making
> a mess out of it. :-)

I noticed the HEAD funniness, too, when looking at this earlier. I agree
with Junio that it's not quite consistent with the general rule of
"string list items point to their refnames", but I don't think it
matters in practice.

I think the fix, if we wanted to do one, would be similar to what you
did in split_symref_update(). Like:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f3455609d6..3f9deff902 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2095,8 +2095,7 @@ static int split_head_update(struct ref_update *update,
 * transaction. This insertion is O(N) in the transaction
 * size, but it happens at most once per transaction.
 */
-   item = string_list_insert(affected_refnames, "HEAD");
-   if (item->util) {
+   if (string_list_has_string(affected_refnames, "HEAD")) {
/* An entry already existed */
strbuf_addf(err,
"multiple updates for 'HEAD' (including one "
@@ -2111,6 +2110,7 @@ static int split_head_update(struct ref_update *update,
update->new_oid.hash, update->old_oid.hash,
update->msg);
 
+   item = string_list_insert(affected_refnames, new_update->refname);
item->util = new_update;
 
return 0;

-Peff


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Martin Ågren
On 5 September 2017 at 12:02, Junio C Hamano  wrote:
> Martin Ågren  writes:
>
>> On 30 August 2017 at 04:52, Michael Haggerty  wrote:
>>> v3 looks good to me. Thanks!
>>>
>>> Reviewed-by: Michael Haggerty 
>>
>> Thank _you_ for very helpful feedback on the earlier versions.
>>
>> Martin
>
> Yes, the earlier attempt was sort-of barking up a wrong tree.
>
> Once we step back and observe other users of affected_refnames and
> realize that the list is meant to to point at a refname field of an
> existing instance of another structure by borrowing, the blame
> shifts from files_transaction_prepare() to the real culprit.
> Michael's review gave us a very good "let's step back a bit" that
> made the huge improvement between v1 and v2/v3.
>
> I wonder if we should be tightening the use of affected_refnames
> even further, though.
>
> It is may be sufficient to make sure that we do not throw anything
> that we would need to free as part of destroying affected_refnames
> into the string list, purely from the "leak prevention" perspective.
>
> But stepping back a bit, the reason why the string list exists in
> the first place is to make sure we do not touch the same ref twice
> in a single transaction, the set of all possible updates in the
> single transaction exists somewhere, and each of these updates know
> what ref it wants to update.
>
> And that is recorded in transaction->updates[]->refname no?
>
> So it seems to me that logically any and all string that is
> registered in affected_refnames list must be the refname field of
> a ref_update structure in the transaction.

I'm with you up to here.

> And from that point of view, doesn't split_head_update() wants a
> similar fix?  It attempts to insert "HEAD", makes sure it hasn't
> been inserted and then hangs a new update transaction as its util.
> It is not wrong per-se from purely leak-prevention point of view,
> as that "HEAD" is a literal string we woudn't even want to free,
> but from logical/"what each data means" point of view, it still
> feels wrong.

There is a "Special hack" comment related to this, and I don't feel
particularly confident that I could make any meaningful contribution in
this area. To be honest, I don't immediately see in which direction your
suggestion/idea/thought is going, which tells me I should not be making
a mess out of it. :-)

Martin


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Junio C Hamano
Martin Ågren  writes:

> On 30 August 2017 at 04:52, Michael Haggerty  wrote:
>> v3 looks good to me. Thanks!
>>
>> Reviewed-by: Michael Haggerty 
>
> Thank _you_ for very helpful feedback on the earlier versions.
>
> Martin

Yes, the earlier attempt was sort-of barking up a wrong tree.  

Once we step back and observe other users of affected_refnames and
realize that the list is meant to to point at a refname field of an
existing instance of another structure by borrowing, the blame
shifts from files_transaction_prepare() to the real culprit.
Michael's review gave us a very good "let's step back a bit" that
made the huge improvement between v1 and v2/v3.

I wonder if we should be tightening the use of affected_refnames
even further, though.  

It is may be sufficient to make sure that we do not throw anything
that we would need to free as part of destroying affected_refnames
into the string list, purely from the "leak prevention" perspective.

But stepping back a bit, the reason why the string list exists in
the first place is to make sure we do not touch the same ref twice
in a single transaction, the set of all possible updates in the
single transaction exists somewhere, and each of these updates know
what ref it wants to update.  

And that is recorded in transaction->updates[]->refname no?

So it seems to me that logically any and all string that is
registered in affected_refnames list must be the refname field of
a ref_update structure in the transaction.

And from that point of view, doesn't split_head_update() wants a
similar fix?  It attempts to insert "HEAD", makes sure it hasn't
been inserted and then hangs a new update transaction as its util.
It is not wrong per-se from purely leak-prevention point of view,
as that "HEAD" is a literal string we woudn't even want to free,
but from logical/"what each data means" point of view, it still
feels wrong.




Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Jeff King
On Tue, Sep 05, 2017 at 11:03:36AM +0200, Michael Haggerty wrote:

> > It feels pretty dirty, though. It would certainly be a bug if we ever
> > decided to switch affected_refnames to duplicate its strings.
> >
> > So given that your solution is only a constant-time factor worse in
> > efficiency, we should probably prefer it as the more maintainable
> > option.
> 
> This is clever, but I don't like that it requires outside code to
> change internal `string_list` structures in a way that is not
> documented to be OK.
> 
> If we cared about getting rid of the extra `O(lg N)` search (and I
> agree with you that it doesn't matter in this case), I think the clean
> way to do it would be for `string_list` to expose a method like
> 
> struct string_list_item *string_list_insert_at_index(struct
> string_list *list, size_t index, const char *string);
> 
> and to use it, together with `string_list_find_insert_index()`, to
> avoid having to search twice.

Yes, agreed on all counts.

-Peff


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Michael Haggerty
On Tue, Sep 5, 2017 at 10:45 AM, Jeff King  wrote:
> On Tue, Aug 29, 2017 at 07:18:22PM +0200, Martin Ågren wrote:
>
>> [...]
>> Rearrange the handling of `referent`, so that we don't add it directly
>> to `affected_refnames`. Instead, first just check whether `referent`
>> exists in the string list, and later add `new_update->refname`.
>
> Coincidentally[1] I came across this same leak, and my solution ended up
> slightly different. I'll share it here in case it's of interest.
>
> In your solution we end up searching the string list twice: once to see
> if we have the item, and then again to insert it. Whereas in the
> original we did both with a single search.
>
> But we can observe that either:
>
>   1. The item already existed, in which case our insert was a noop, and
>  we're good.
>
> or
>
>   2. We inserted it, in which case we proceed with creating new_update.
>
>  We can then in O(1) replace the pointer in the string list item
>  with the storage in new_update. We know we're not violating any
>  string_list invariants because the strings contain the same bytes.
>
> I.e.:
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 9266f5ab9d..1d16c1b33e 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2178,6 +2178,11 @@ static int split_symref_update(struct files_ref_store 
> *refs,
> update->flags |= REF_LOG_ONLY | REF_NODEREF;
> update->flags &= ~REF_HAVE_OLD;
>
> +   /*
> +* Re-point at the storage provided by our ref_update, which we know
> +* will last as long as the affected_refnames list.
> +*/
> +   item->string = new_update->refname;
> item->util = new_update;
>
> return 0;
>
> It feels pretty dirty, though. It would certainly be a bug if we ever
> decided to switch affected_refnames to duplicate its strings.
>
> So given that your solution is only a constant-time factor worse in
> efficiency, we should probably prefer it as the more maintainable
> option.

This is clever, but I don't like that it requires outside code to
change internal `string_list` structures in a way that is not
documented to be OK.

If we cared about getting rid of the extra `O(lg N)` search (and I
agree with you that it doesn't matter in this case), I think the clean
way to do it would be for `string_list` to expose a method like

struct string_list_item *string_list_insert_at_index(struct
string_list *list, size_t index, const char *string);

and to use it, together with `string_list_find_insert_index()`, to
avoid having to search twice.

Michael


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Jeff King
On Tue, Aug 29, 2017 at 07:18:22PM +0200, Martin Ågren wrote:

> Observe that split_symref_update() creates a `new_update`-object through
> ref_transaction_add_update(), after which `new_update->refname` is a
> copy of `referent`. The difference is, this copy will be freed, and it
> will be freed *after* `affected_refnames` has been cleared.
> 
> Rearrange the handling of `referent`, so that we don't add it directly
> to `affected_refnames`. Instead, first just check whether `referent`
> exists in the string list, and later add `new_update->refname`.

Coincidentally[1] I came across this same leak, and my solution ended up
slightly different. I'll share it here in case it's of interest.

In your solution we end up searching the string list twice: once to see
if we have the item, and then again to insert it. Whereas in the
original we did both with a single search.

But we can observe that either:

  1. The item already existed, in which case our insert was a noop, and
 we're good.

or

  2. We inserted it, in which case we proceed with creating new_update.

 We can then in O(1) replace the pointer in the string list item
 with the storage in new_update. We know we're not violating any
 string_list invariants because the strings contain the same bytes.

I.e.:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9266f5ab9d..1d16c1b33e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2178,6 +2178,11 @@ static int split_symref_update(struct files_ref_store 
*refs,
update->flags |= REF_LOG_ONLY | REF_NODEREF;
update->flags &= ~REF_HAVE_OLD;
 
+   /*
+* Re-point at the storage provided by our ref_update, which we know
+* will last as long as the affected_refnames list.
+*/
+   item->string = new_update->refname;
item->util = new_update;
 
return 0;

It feels pretty dirty, though. It would certainly be a bug if we ever
decided to switch affected_refnames to duplicate its strings.

So given that your solution is only a constant-time factor worse in
efficiency, we should probably prefer it as the more maintainable
option.

-Peff

[1] It's not really a coincidence, of course. All the recent leak
discussion has got both of us prodding at Git with various tools. :)


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-08-30 Thread Martin Ågren
On 30 August 2017 at 04:52, Michael Haggerty  wrote:
> v3 looks good to me. Thanks!
>
> Reviewed-by: Michael Haggerty 

Thank _you_ for very helpful feedback on the earlier versions.

Martin


Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-08-29 Thread Michael Haggerty
v3 looks good to me. Thanks!

Reviewed-by: Michael Haggerty 

Michael