Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-05 Thread Brad King
On 9/5/2013 5:23 PM, Junio C Hamano wrote:
> Brad King  writes:
>>  create SP  NUL  NUL
>>  update SP  NUL  NUL [] NUL
> 
> That SP in '-z' format looks strange.  Was there a reason why NUL
> was inappropriate?

The precedent I saw in the -z survey I posted is that NUL is used
to terminate fields that can contain arbitrary text but otherwise
SP or TAB is still used to terminate simple fields.  I recall no
cases that use NUL after fields that only contain simple values.

>> option::
>>  Specify an option to take effect for following commands.
> 
> This last one is somewhat peculiar, especially because it says
> "following command*s*".
> 
> How would I request to update refs HEAD and next in an all-or-none
> fashion, while applying 'no-deref' only to HEAD but not next?

You're right.  It will be simpler for clients to generate each
sequence of options followed by a  command without worrying
about options possibly generated for preceding s.  So:

option::
Modify behavior of the next command naming a .
The only valid option is `no-deref` to avoid dereferencing
a symbolic ref.

> When I said "create or update" in the message you are responding to,
> I did not mean that we should have two separate commands.

The nice thing about "create" is that it implies oldvalue=zero, just
as "delete" implies newvalue=zero.  The symmetry feels clean.  Also,
in -z mode we need to treat an empty string  as missing
rather than zero.  The only way to specify create with the "update"
command is with literal 40 "0" as .

> The regular command line does create-or-update; if it exists already,
> it is an update, and if it does not, it is a create.

The proposed update command can be used for create, update, delete,
or verify with proper arguments and use of 40 "0".  The other 
commands are for convenience, shorter input, and statement of intent.

>   create-or-update-no-deref   []
> delete-no-deref  []
> 
> Also how would one set the reflog message for the proposed update?

This is why I proposed a separate option command.  It can be used
both for no-deref and for other options we want to add later e.g.

  option SP message SP  LF

and with -z:

  option SP message SP  NUL

For now I think it is sufficient for the -m  command-line
option to set the same message for all updates.  The option command
leaves room to add a per-ref message later.

BTW, the reasons I propose message as an option rather than its own
command are:

* It is simpler to document the reach of the one option command
  (affects next ) in one place rather than separately for
  each option-like command.

* It reduces the number of commands that do not take a .

-Brad
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-05 Thread Junio C Hamano
Brad King  writes:

> On 09/04/2013 05:27 PM, Junio C Hamano wrote:
>> I am not saying the above is the best format, but the point is that
>> the mode of the operation defines the structure
>
> Great, thanks for your comments.  Based on that I've prototyped a
> new format.  Rather than jumping straight to the patch, here is my
> proposed format documentation for review:
>
> -
> With `--stdin`, update-ref reads instructions from standard input and
> performs all modifications together.  Specify commands of the form:
>
>   create SP  SP  LF
>   update SP  SP  [SP ] LF
>   delete SP  [SP ] LF
>   verify SP  [SP ] LF
>   option SP  LF
>
> Quote fields containing whitespace as if they were strings in C source
> code.  Alternatively, use `-z` to specify commands without quoting:
>
>   create SP  NUL  NUL
>   update SP  NUL  NUL [] NUL
>   delete SP  NUL [] NUL
>   verify SP  NUL [] NUL
>   option SP  NUL

That SP in '-z' format looks strange.  Was there a reason why NUL
was inappropriate?

> Lines of any other format or a repeated  produce an error.
> Command meanings are:
>
> create::
>   Create  with  only if it does not exist.
>
> update::
>   Update  to be , verifying  if given.
>   Specify a zero  to delete a ref and/or a zero
>to make sure that a ref does not exist.
>
> delete::
>   Delete , verifying  if given.
>
> verify::
>   Verify  against  but do not change it.  If
>zero or missing, the ref must not exist.
>
> option::
>   Specify an option to take effect for following commands.
>   Valid options are `deref` and `no-deref` to specify whether
>   to dereference symbolic refs.

This last one is somewhat peculiar, especially because it says
"following command*s*".

How would I request to update refs HEAD and next in an all-or-none
fashion, while applying 'no-deref' only to HEAD but not next?

update refs/heads/next 
option --no-deref
update HEAD 

sounds somewhat convoluted.

When I said "create or update" in the message you are responding to,
I did not mean that we should have two separate commands.  The
regular command line does create-or-update; if it exists already, it
is an update, and if it does not, it is a create.

If we were to have two, I would say we should have:

create-or-update   []
create-or-update-no-deref   []

An old value of 0{40} would mean "I should be the one creating; if
somebody else created it while I was preparing this request, please
fail".

Similarly for delete:

delete  []
delete-no-deref  []

Also how would one set the reflog message for the proposed update?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-05 Thread Brad King
On 09/04/2013 05:27 PM, Junio C Hamano wrote:
> I am not saying the above is the best format, but the point is that
> the mode of the operation defines the structure

Great, thanks for your comments.  Based on that I've prototyped a
new format.  Rather than jumping straight to the patch, here is my
proposed format documentation for review:

-
With `--stdin`, update-ref reads instructions from standard input and
performs all modifications together.  Specify commands of the form:

create SP  SP  LF
update SP  SP  [SP ] LF
delete SP  [SP ] LF
verify SP  [SP ] LF
option SP  LF

Quote fields containing whitespace as if they were strings in C source
code.  Alternatively, use `-z` to specify commands without quoting:

create SP  NUL  NUL
update SP  NUL  NUL [] NUL
delete SP  NUL [] NUL
verify SP  NUL [] NUL
option SP  NUL

Lines of any other format or a repeated  produce an error.
Command meanings are:

create::
Create  with  only if it does not exist.

update::
Update  to be , verifying  if given.
Specify a zero  to delete a ref and/or a zero
 to make sure that a ref does not exist.

delete::
Delete , verifying  if given.

verify::
Verify  against  but do not change it.  If
 zero or missing, the ref must not exist.

option::
Specify an option to take effect for following commands.
Valid options are `deref` and `no-deref` to specify whether
to dereference symbolic refs.

Use 40 "0" or the empty string to specify a zero value, except that
with `-z` an empty  is considered missing.

If all s can be locked with matching s
simultaneously, all modifications are performed.  Otherwise, no
modifications are performed.  Note that while each individual
 is updated or deleted atomically, a concurrent reader may
still see a subset of the modifications.
-

Thanks,
-Brad
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-04 Thread Junio C Hamano
Brad King  writes:

> Nothing else uses LF NUL.  I chose it as a starting point for
> this very discussion, which I asked about in $gmane/233653.

The primary reason why LF raised my eyebrow was because the reason
why many subcommands use "-z" (and NUL) is often because the payload
may have LF in a record and LF cannot be used as a record separator
without escaping.  And they use NUL knowing that the payload data in
fields cannot contain a NUL.  If we used LF as a signal to define
the structure of the record, it pretty much defeats the whole point
of defining "-z" format.  The -m reason string will be made into a
single liner deep in the codepath but it _can_ contain LF.

I would have been more receptive to, say, double-NUL as a record
terminator, while using a NUL as a field terminator, or something,
but then we would need to have a way to express an empty field.

> In this particular use case we know the last field will never
> be LF but that may not be so for future cases.  There is no way
> to represent sentinel-terminated arbitrary variable-width records
> of NUL-terminated fields without some kind of escaping for the
> sentinel value, but the whole point of -z is to avoid escaping.

Indeed, but one escape hatch we have is that payload will not
contain NUL anywhere, so whenever we see a NUL, we can trust that it
defines the structure of the record, and is not a part of the
payload.

Stepping back a bit, here are some observations on the arguments
update-ref can take:

 * "-m " is a reason given for this entire update. As the
   point of this new feature is to give an all-or-none update to one
   or more refs, I think we should not have to accept more than one
   reason (more specifically, the -m option does _not_ belong to a
   specific record that describes what happens to a single ref).

 * "-d  " is a way to delete a ref.  may be
   missing.

 * "--no-deref   " and " 
   " are ways to update or create a ref. Again 
   may be missing.

So it looks to me that one possible format that is easy to generate
by machine without ambiguity may be:

* The first record could be

  m NUL  NUL

  but it is optional. The reason string may contain LF but just
  like invocation from the command line, LF will eventually
  cleaned up into a SP.

* Then a series of records of different kinds follow.

  - A delete record looks like this:

d NUL  NUL  NUL

If you want to delete the ref without "oldvalue" protection,
just say

d NUL  NUL NUL

  - A create/update record looks like one of these:

u NUL  NUL  NUL  NUL
n NUL  NUL  NUL  NUL

Again, if you want to delete the ref without "oldvalue"
protection, just say

u NUL  NUL  NUL NUL
n NUL  NUL  NUL NUL

 * EOF signals the end of the request.

I am not saying the above is the best format, but the point is that
the mode of the operation defines the structure, so unlike parsing
xml or json where you first parse the structure and then interpret
what each element means, you can define a simple format where the
kind of element comes upfront to allow the parser/interpreter know
what is expected to follow it.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-04 Thread Brad King
On 09/04/2013 02:23 PM, Junio C Hamano wrote:
> "whitespace-separated" implies that we may allow fields separated with not a 
> single SP, but with double SPs or even HTs between them. I personally do not 
> think we should be so loose

Okay, I will look at making it more strict.  See proposed format
below.

>> +Quote arguments containing whitespace as if in C source code.
> 
> Probably "as if they were strings in C source code".

Fixed.

>> +terminate each argument by NUL and each line by LF NUL.
> 
> This is a somewhat interesting choice of the record terminator. Do we have a 
> precedent to use LF NUL elsewhere?  If this is the first such case where we 
> need to express variable number of NUL-separated fields in a record, I think 
> I am fine with LF NUL (but I am sure other people would give us a
> better convention if we ask them politely ;-)), but I just want to make sure 
> we do it the same way as other codepaths, if exist, that have to handle this 
> kind of thing.

Nothing else uses LF NUL.  I chose it as a starting point for
this very discussion, which I asked about in $gmane/233653.
In this particular use case we know the last field will never
be LF but that may not be so for future cases.  There is no way
to represent sentinel-terminated arbitrary variable-width records
of NUL-terminated fields without some kind of escaping for the
sentinel value, but the whole point of -z is to avoid escaping.

Below is a survey of all mentions of NUL and \0 in documented
formats as of v1.8.4.  The summary is that most are fixed-width
records but a few have variable width allowing n or n+1 fields.
In all variable-width cases there is structured information in
the first field that indicates the number of NUL-terminated
fields to expect.

In the motivating case here, we could use a --no-old or
--have-old option to indicate in one field how many more to
expect in the record, but that will be quite verbose.

Side note: I'd like to reserve room for the leading options to
include things like "-m NUL  NUL" so we cannot keep them
all in in a single NUL-terminated, SP-separated field.

Another approach is to introduce a way to represent "not here"
for the  argument that is not an otherwise valid value.
This would make the non-option part of the record have a fixed
width of 3 fields.  For example, we could use SP in -z mode:

 [- NUL]...  NUL  NUL (|SP) NUL

and the last field can be optional in non-z mode anyway:

 [- SP]...  SP  [SP ] LF

Or we could use a character like "~" (other ideas?):

 [- NUL]...  NUL  NUL (|~) NUL

and make it available in non-z mode too:

 [- SP]...  SP  [SP (|~)] LF

Thoughts?
-Brad


Survey of NUL in documented formats:

* Documentation/diff-format.txt: The -z mode for --numstat prints
  NUL-terminated lines but there is exactly one path at the end of
  each entry and the earlier fields are separated by TAB because
  they are structured.

* Documentation/diff-options.txt: The -z mode for diff-tree output
  prints structured SP/TAB-separated fields in a NUL-terminated
  field followed by either one or two NUL-terminated paths.  This
  is variable width but the first field tells us how wide.

* Documentation/git-apply.txt: The -z mode forwards to --numstat
  diff options.

* Documentation/git-check-attr.txt: The -z mode for stdin reads
  NUL-terminated paths.

* Documentation/git-check-ignore.txt: The -z mode for stdin reads
  NUL-terminated paths.  The -z mode for output prints a fixed-width
  table with every group of four NUL-terminated fields forming a row.

* Documentation/git-checkout-index.txt: The -z mode reads
  NUL-terminated paths.

* Documentation/git-commit.txt: The -z mode forwards to git-status.

* Documentation/git-grep.txt: The -z mode separates file names from
  the matched line by a NUL.  Therefore NUL divides LF-terminated
  lines into two pieces.

* Documentation/git-ls-files.txt: The -z mode prints NUL-terminated
  lines but there is exactly one path at the end of each entry and the
  earlier fields are separated by SP and TAB because they are
  structured.

* Documentation/git-ls-tree.txt: The -z mode prints NUL-terminated
  lines but there is exactly one path at the end of each entry and the
  earlier fields are separated by SP and TAB because they are
  structured.

* Documentation/git-mktree.txt: The -z mode reads NUL-terminated lines
  as output by ls-tree -z.

* Documentation/git-status.txt: The -z mode of --porcelain separates
  a variable number of entries by NUL.  The beginning of each entry
  allows one to know the number of NUL-terminated fields to expect
  (A = 1 total NUL, R = 2 total NULs, etc.).

* Documentation/git-update-index.txt: The -z mode of --stdin separates
  paths by NUL.  The -z mode of --index-info separates entries by NUL
  but there is exactly one path at the end of each entry and the earlier
  fields are separated by SP and TAB because they are structured.

* Documenta

Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-04 Thread Brad King
On 09/04/2013 03:17 PM, Junio C Hamano wrote:
> Brad King  writes:
>> +static void update_refs_stdin_read_n()
>> +static void update_refs_stdin_read_z()
> 
> These need to be defined with explicit (void) argument list.

Oops, fixed.

Thanks,
-Brad
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-04 Thread Junio C Hamano
Brad King  writes:

> +static void update_refs_stdin_read_n()
> +{
> + struct strbuf line = STRBUF_INIT;
> +
> + while (strbuf_getline(&line, stdin, '\n') != EOF)
> + update_refs_stdin_parse_line(line.buf);
> +
> + strbuf_release(&line);
> +}
> +
> +static void update_refs_stdin_read_z()
> +{

These need to be defined with explicit (void) argument list.

static void foo(void)
{
...

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-04 Thread Junio C Hamano
Brad King  writes:

> +With `--stdin`, update-ref reads instructions from standard input and
> +performs all modifications together.  Empty lines are ignored.
> +Each non-empty line is parsed as whitespace-separated arguments.

"whitespace-separated" implies that we may allow fields separated
with not a single SP, but with double SPs or even HTs between them.
I personally do not think we should be so loose; it is not like we
are allowing the user to type these from the terminal, so limiting
"separated by a single SP" may be perfectly adequate, and make it
less error prone (e.g. there will not be a confusion "Is 'A SP SP B'
A followed by B, or A followed by an empty string followed by B?").

> +Quote arguments containing whitespace as if in C source code.

Probably "as if they were strings in C source code".

> +Specify updates with lines of the form:
> +
> + [--no-deref] [--]   []
> +
> +Lines of any other format or a repeated  produce an error.
> +Specify a zero  to delete a ref and/or a zero 
> +to make sure that a ref not exist.  Use either 40 "0" or the
> +empty string (written as "") to specify a zero value.  Use `-z`
> +to specify input with no whitespace, quoting, or escaping, and
> +terminate each argument by NUL and each line by LF NUL.

This is a somewhat interesting choice of the record terminator. Do
we have a precedent to use LF NUL elsewhere?  If this is the first
such case where we need to express variable number of NUL-separated
fields in a record, I think I am fine with LF NUL (but I am sure
other people would give us a better convention if we ask them
politely ;-)), but I just want to make sure we do it the same way as
other codepaths, if exist, that have to handle this kind of thing.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html