Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()

2016-09-27 Thread Jakub Narębski
W dniu 27.09.2016 o 10:39, Jeff King pisze:
> On Mon, Sep 26, 2016 at 09:21:10PM +0200, Lars Schneider wrote:
> 
>> On 25 Sep 2016, at 13:26, Jakub Narębski  wrote:
>>
>>> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
 From: Lars Schneider 
 ...

 +static int packet_write_gently(const int fd_out, const char *buf, size_t 
 size)
>>>
>>> I'm not sure what naming convention the rest of Git uses, but isn't
>>> it more like '*data' rather than '*buf' here?
>>
>> pkt-line seems to use 'buf' or 'buffer' for everything else.
> 
> I do not think we have definite rules, but I would generally expect to
> see "data" as an opaque thing (e.g., passing "void *data" to callbacks).
> "buf" or "buffer" makes sense here, but I don't think it really matters
> that much either way.

True.

 +  static char packet_write_buffer[LARGE_PACKET_MAX];
>>>
>>> I think there should be warning (as a comment before function
>>> declaration, or before function definition), that packet_write_gently()
>>> is not thread-safe (nor reentrant, but the latter does not matter here,
>>> I think).
>>>
>>> Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058
>>>
>>> This is not something terribly important; I guess git code has tons
>>> of functions not marked as thread-unsafe...
>>
>> I agree that the function is not thread-safe. However, I can't find 
>> an example in the Git source that marks a function as not thread-safe.
>> Unless is it explicitly stated in the coding guidelines I would prefer
>> not to start way to mark functions.

There is *one* example: "fill_textconv is not remotely thread-safe;"
comment in grep.c, but not in diff.{c,h} where it is declared/defined.

Also, it is static function; we should know if it is thread-safe
or not.

I am thinking about supporting streaming in the future, and perhaps
also running different filter drivers (for different files) in parallel.
I guess that using "static __thread char packet_write_buffer[...]"
is out of question (still not reentrant)?

> 
> I'd agree. A large number of functions in git are not reentrant, and I
> would not want to give the impression that those missing a warning are
> safe to use.

The fact tha git code is undercommented and underdocumented does not
mean that we should not add comments and documentation.

> 
 +  if (size > sizeof(packet_write_buffer) - 4) {
>>>
>>> First, wouldn't the following be more readable:
>>>
>>>  +  if (size + 4 > LARGE_PACKET_MAX) {
>>
>> Peff suggested that here:
>> http://public-inbox.org/git/20160810132814.gqnipsdwyzjmu...@sigill.intra.peff.net/
> 
> There is a good reason to do size checks as a subtraction from a known
> quantity: you can be sure that you are not introducing an overflow
> (e.g., Jakub's suggestion does the wrong thing when "size" is within 4
> bytes of its maximum value). That's unlikely in this case, but then so
> is the size exceeding LARGE_PACKET_MAX in the first place (arguably this
> should be a die("BUG"), because it is the caller's responsibility to
> split their packets.

Right.  I should train myself to watch for overflows.

-- 
Jakub Narębski



Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()

2016-09-27 Thread Jeff King
On Mon, Sep 26, 2016 at 09:21:10PM +0200, Lars Schneider wrote:

> On 25 Sep 2016, at 13:26, Jakub Narębski  wrote:
> 
> > W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
> >> From: Lars Schneider 
> >> ...
> >> 
> >> +static int packet_write_gently(const int fd_out, const char *buf, size_t 
> >> size)
> > 
> > I'm not sure what naming convention the rest of Git uses, but isn't
> > it more like '*data' rather than '*buf' here?
> 
> pkt-line seems to use 'buf' or 'buffer' for everything else.

I do not think we have definite rules, but I would generally expect to
see "data" as an opaque thing (e.g., passing "void *data" to callbacks).
"buf" or "buffer" makes sense here, but I don't think it really matters
that much either way.

> >> +  static char packet_write_buffer[LARGE_PACKET_MAX];
> > 
> > I think there should be warning (as a comment before function
> > declaration, or before function definition), that packet_write_gently()
> > is not thread-safe (nor reentrant, but the latter does not matter here,
> > I think).
> > 
> > Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058
> > 
> > This is not something terribly important; I guess git code has tons
> > of functions not marked as thread-unsafe...
> 
> I agree that the function is not thread-safe. However, I can't find 
> an example in the Git source that marks a function as not thread-safe.
> Unless is it explicitly stated in the coding guidelines I would prefer
> not to start way to mark functions.

I'd agree. A large number of functions in git are not reentrant, and I
would not want to give the impression that those missing a warning are
safe to use.

> >> +  if (size > sizeof(packet_write_buffer) - 4) {
> > 
> > First, wouldn't the following be more readable:
> > 
> >  +  if (size + 4 > LARGE_PACKET_MAX) {
> 
> Peff suggested that here:
> http://public-inbox.org/git/20160810132814.gqnipsdwyzjmu...@sigill.intra.peff.net/

There is a good reason to do size checks as a subtraction from a known
quantity: you can be sure that you are not introducing an overflow
(e.g., Jakub's suggestion does the wrong thing when "size" is within 4
bytes of its maximum value). That's unlikely in this case, but then so
is the size exceeding LARGE_PACKET_MAX in the first place (arguably this
should be a die("BUG"), because it is the caller's responsibility to
split their packets.

-Peff


Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()

2016-09-26 Thread Lars Schneider

On 25 Sep 2016, at 13:26, Jakub Narębski  wrote:

> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider 
>> ...
>> 
>> +static int packet_write_gently(const int fd_out, const char *buf, size_t 
>> size)
> 
> I'm not sure what naming convention the rest of Git uses, but isn't
> it more like '*data' rather than '*buf' here?

pkt-line seems to use 'buf' or 'buffer' for everything else.


>> +{
>> +static char packet_write_buffer[LARGE_PACKET_MAX];
> 
> I think there should be warning (as a comment before function
> declaration, or before function definition), that packet_write_gently()
> is not thread-safe (nor reentrant, but the latter does not matter here,
> I think).
> 
> Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058
> 
> This is not something terribly important; I guess git code has tons
> of functions not marked as thread-unsafe...

I agree that the function is not thread-safe. However, I can't find 
an example in the Git source that marks a function as not thread-safe.
Unless is it explicitly stated in the coding guidelines I would prefer
not to start way to mark functions.


>> +if (size > sizeof(packet_write_buffer) - 4) {
> 
> First, wouldn't the following be more readable:
> 
>  +if (size + 4 > LARGE_PACKET_MAX) {

Peff suggested that here:
http://public-inbox.org/git/20160810132814.gqnipsdwyzjmu...@sigill.intra.peff.net/


>> +return error("packet write failed - data exceeds max packet 
>> size");
>> +}
> 
> Second, CodingGuidelines is against using braces (blocks) for one
> line conditionals: "We avoid using braces unnecessarily."
> 
> But this is just me nitpicking.

Fixed.


>> +packet_trace(buf, size, 1);
>> +size += 4;
>> +set_packet_header(packet_write_buffer, size);
>> +memcpy(packet_write_buffer + 4, buf, size - 4);
>> +if (write_in_full(fd_out, packet_write_buffer, size) == size)
> 
> Hmmm... in some places we use original size, in others (original) size + 4;
> perhaps it would be more readable to add a new local temporary variable
> 
>   size_t full_size = size + 4;

Agreed. I introduced "packet_size".

Thanks,
Lars

Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()

2016-09-25 Thread Jakub Narębski
W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
> From: Lars Schneider 
> 
> packet_write_fmt_gently() uses format_packet() which lets the caller
> only send string data via "%s". That means it cannot be used for
> arbitrary data that may contain NULs.
> 
> Add packet_write_gently() which writes arbitrary data and does not die
> in case of an error. The function is used by other pkt-line functions in
> a subsequent patch.

Nice; obviously needed for sending binary data.

I wonder how send-pack / receive-pack handles sending binary files.
Though this is outside of scope of this patch series, it is something
to think about for later.

> 
> Signed-off-by: Lars Schneider 
> ---
>  pkt-line.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/pkt-line.c b/pkt-line.c
> index 19f0271..fc0ac12 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -171,6 +171,22 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
>   return status;
>  }
>  
> +static int packet_write_gently(const int fd_out, const char *buf, size_t 
> size)

I'm not sure what naming convention the rest of Git uses, but isn't
it more like '*data' rather than '*buf' here?

> +{
> + static char packet_write_buffer[LARGE_PACKET_MAX];

I think there should be warning (as a comment before function
declaration, or before function definition), that packet_write_gently()
is not thread-safe (nor reentrant, but the latter does not matter here,
I think).

Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058

This is not something terribly important; I guess git code has tons
of functions not marked as thread-unsafe...

> +
> + if (size > sizeof(packet_write_buffer) - 4) {

First, wouldn't the following be more readable:

  + if (size + 4 > LARGE_PACKET_MAX) {

> + return error("packet write failed - data exceeds max packet 
> size");
> + }

Second, CodingGuidelines is against using braces (blocks) for one
line conditionals: "We avoid using braces unnecessarily."

But this is just me nitpicking.

> + packet_trace(buf, size, 1);
> + size += 4;
> + set_packet_header(packet_write_buffer, size);
> + memcpy(packet_write_buffer + 4, buf, size - 4);
> + if (write_in_full(fd_out, packet_write_buffer, size) == size)

Hmmm... in some places we use original size, in others (original) size + 4;
perhaps it would be more readable to add a new local temporary variable

size_t full_size = size + 4;

Or perhaps use 'data_size' and 'packet_size', where 'packet_size = data_size + 
4'.
But that might be too chatty for variable names ;-)

> + return 0;
> + return error("packet write failed");
> +}

Compared to previous iterations, where there were two versions
of this function, IIRC sharing no common code: one taking buffer
which had to be with place for packet size info, one with a separate
local buffer for packet size only and using two writes.  This
version uses static buffer (thus not thread-safe, I think; and
not reentrant), and memcpy.

Anyway, if reentrant / thread-safe version would be required,
or not doing memcpy turns out to be important with respect to
performance, we can provide with the *_r version:

  static int packet_write_gently_r(const int fd_out, const char *data, size_t 
size,
   char *restrict buf)

We can check if 'buf + 4 == data', and if it is, we can skip
memcpy() as an optimization.

This is something for the future, but not very important for
having this patch series accepted.

> +
>  void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
>  {
>   va_list args;
> 

Best,
-- 
Jakub Narębski