Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()
W dniu 27.09.2016 o 10:39, Jeff King pisze: > On Mon, Sep 26, 2016 at 09:21:10PM +0200, Lars Schneider wrote: > >> On 25 Sep 2016, at 13:26, Jakub Narębskiwrote: >> >>> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze: From: Lars Schneider ... +static int packet_write_gently(const int fd_out, const char *buf, size_t size) >>> >>> I'm not sure what naming convention the rest of Git uses, but isn't >>> it more like '*data' rather than '*buf' here? >> >> pkt-line seems to use 'buf' or 'buffer' for everything else. > > I do not think we have definite rules, but I would generally expect to > see "data" as an opaque thing (e.g., passing "void *data" to callbacks). > "buf" or "buffer" makes sense here, but I don't think it really matters > that much either way. True. + static char packet_write_buffer[LARGE_PACKET_MAX]; >>> >>> I think there should be warning (as a comment before function >>> declaration, or before function definition), that packet_write_gently() >>> is not thread-safe (nor reentrant, but the latter does not matter here, >>> I think). >>> >>> Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058 >>> >>> This is not something terribly important; I guess git code has tons >>> of functions not marked as thread-unsafe... >> >> I agree that the function is not thread-safe. However, I can't find >> an example in the Git source that marks a function as not thread-safe. >> Unless is it explicitly stated in the coding guidelines I would prefer >> not to start way to mark functions. There is *one* example: "fill_textconv is not remotely thread-safe;" comment in grep.c, but not in diff.{c,h} where it is declared/defined. Also, it is static function; we should know if it is thread-safe or not. I am thinking about supporting streaming in the future, and perhaps also running different filter drivers (for different files) in parallel. I guess that using "static __thread char packet_write_buffer[...]" is out of question (still not reentrant)? > > I'd agree. A large number of functions in git are not reentrant, and I > would not want to give the impression that those missing a warning are > safe to use. The fact tha git code is undercommented and underdocumented does not mean that we should not add comments and documentation. > + if (size > sizeof(packet_write_buffer) - 4) { >>> >>> First, wouldn't the following be more readable: >>> >>> + if (size + 4 > LARGE_PACKET_MAX) { >> >> Peff suggested that here: >> http://public-inbox.org/git/20160810132814.gqnipsdwyzjmu...@sigill.intra.peff.net/ > > There is a good reason to do size checks as a subtraction from a known > quantity: you can be sure that you are not introducing an overflow > (e.g., Jakub's suggestion does the wrong thing when "size" is within 4 > bytes of its maximum value). That's unlikely in this case, but then so > is the size exceeding LARGE_PACKET_MAX in the first place (arguably this > should be a die("BUG"), because it is the caller's responsibility to > split their packets. Right. I should train myself to watch for overflows. -- Jakub Narębski
Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()
On Mon, Sep 26, 2016 at 09:21:10PM +0200, Lars Schneider wrote: > On 25 Sep 2016, at 13:26, Jakub Narębskiwrote: > > > W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze: > >> From: Lars Schneider > >> ... > >> > >> +static int packet_write_gently(const int fd_out, const char *buf, size_t > >> size) > > > > I'm not sure what naming convention the rest of Git uses, but isn't > > it more like '*data' rather than '*buf' here? > > pkt-line seems to use 'buf' or 'buffer' for everything else. I do not think we have definite rules, but I would generally expect to see "data" as an opaque thing (e.g., passing "void *data" to callbacks). "buf" or "buffer" makes sense here, but I don't think it really matters that much either way. > >> + static char packet_write_buffer[LARGE_PACKET_MAX]; > > > > I think there should be warning (as a comment before function > > declaration, or before function definition), that packet_write_gently() > > is not thread-safe (nor reentrant, but the latter does not matter here, > > I think). > > > > Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058 > > > > This is not something terribly important; I guess git code has tons > > of functions not marked as thread-unsafe... > > I agree that the function is not thread-safe. However, I can't find > an example in the Git source that marks a function as not thread-safe. > Unless is it explicitly stated in the coding guidelines I would prefer > not to start way to mark functions. I'd agree. A large number of functions in git are not reentrant, and I would not want to give the impression that those missing a warning are safe to use. > >> + if (size > sizeof(packet_write_buffer) - 4) { > > > > First, wouldn't the following be more readable: > > > > + if (size + 4 > LARGE_PACKET_MAX) { > > Peff suggested that here: > http://public-inbox.org/git/20160810132814.gqnipsdwyzjmu...@sigill.intra.peff.net/ There is a good reason to do size checks as a subtraction from a known quantity: you can be sure that you are not introducing an overflow (e.g., Jakub's suggestion does the wrong thing when "size" is within 4 bytes of its maximum value). That's unlikely in this case, but then so is the size exceeding LARGE_PACKET_MAX in the first place (arguably this should be a die("BUG"), because it is the caller's responsibility to split their packets. -Peff
Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()
On 25 Sep 2016, at 13:26, Jakub Narębskiwrote: > W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze: >> From: Lars Schneider >> ... >> >> +static int packet_write_gently(const int fd_out, const char *buf, size_t >> size) > > I'm not sure what naming convention the rest of Git uses, but isn't > it more like '*data' rather than '*buf' here? pkt-line seems to use 'buf' or 'buffer' for everything else. >> +{ >> +static char packet_write_buffer[LARGE_PACKET_MAX]; > > I think there should be warning (as a comment before function > declaration, or before function definition), that packet_write_gently() > is not thread-safe (nor reentrant, but the latter does not matter here, > I think). > > Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058 > > This is not something terribly important; I guess git code has tons > of functions not marked as thread-unsafe... I agree that the function is not thread-safe. However, I can't find an example in the Git source that marks a function as not thread-safe. Unless is it explicitly stated in the coding guidelines I would prefer not to start way to mark functions. >> +if (size > sizeof(packet_write_buffer) - 4) { > > First, wouldn't the following be more readable: > > +if (size + 4 > LARGE_PACKET_MAX) { Peff suggested that here: http://public-inbox.org/git/20160810132814.gqnipsdwyzjmu...@sigill.intra.peff.net/ >> +return error("packet write failed - data exceeds max packet >> size"); >> +} > > Second, CodingGuidelines is against using braces (blocks) for one > line conditionals: "We avoid using braces unnecessarily." > > But this is just me nitpicking. Fixed. >> +packet_trace(buf, size, 1); >> +size += 4; >> +set_packet_header(packet_write_buffer, size); >> +memcpy(packet_write_buffer + 4, buf, size - 4); >> +if (write_in_full(fd_out, packet_write_buffer, size) == size) > > Hmmm... in some places we use original size, in others (original) size + 4; > perhaps it would be more readable to add a new local temporary variable > > size_t full_size = size + 4; Agreed. I introduced "packet_size". Thanks, Lars
Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()
W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze: > From: Lars Schneider> > packet_write_fmt_gently() uses format_packet() which lets the caller > only send string data via "%s". That means it cannot be used for > arbitrary data that may contain NULs. > > Add packet_write_gently() which writes arbitrary data and does not die > in case of an error. The function is used by other pkt-line functions in > a subsequent patch. Nice; obviously needed for sending binary data. I wonder how send-pack / receive-pack handles sending binary files. Though this is outside of scope of this patch series, it is something to think about for later. > > Signed-off-by: Lars Schneider > --- > pkt-line.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/pkt-line.c b/pkt-line.c > index 19f0271..fc0ac12 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -171,6 +171,22 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...) > return status; > } > > +static int packet_write_gently(const int fd_out, const char *buf, size_t > size) I'm not sure what naming convention the rest of Git uses, but isn't it more like '*data' rather than '*buf' here? > +{ > + static char packet_write_buffer[LARGE_PACKET_MAX]; I think there should be warning (as a comment before function declaration, or before function definition), that packet_write_gently() is not thread-safe (nor reentrant, but the latter does not matter here, I think). Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058 This is not something terribly important; I guess git code has tons of functions not marked as thread-unsafe... > + > + if (size > sizeof(packet_write_buffer) - 4) { First, wouldn't the following be more readable: + if (size + 4 > LARGE_PACKET_MAX) { > + return error("packet write failed - data exceeds max packet > size"); > + } Second, CodingGuidelines is against using braces (blocks) for one line conditionals: "We avoid using braces unnecessarily." But this is just me nitpicking. > + packet_trace(buf, size, 1); > + size += 4; > + set_packet_header(packet_write_buffer, size); > + memcpy(packet_write_buffer + 4, buf, size - 4); > + if (write_in_full(fd_out, packet_write_buffer, size) == size) Hmmm... in some places we use original size, in others (original) size + 4; perhaps it would be more readable to add a new local temporary variable size_t full_size = size + 4; Or perhaps use 'data_size' and 'packet_size', where 'packet_size = data_size + 4'. But that might be too chatty for variable names ;-) > + return 0; > + return error("packet write failed"); > +} Compared to previous iterations, where there were two versions of this function, IIRC sharing no common code: one taking buffer which had to be with place for packet size info, one with a separate local buffer for packet size only and using two writes. This version uses static buffer (thus not thread-safe, I think; and not reentrant), and memcpy. Anyway, if reentrant / thread-safe version would be required, or not doing memcpy turns out to be important with respect to performance, we can provide with the *_r version: static int packet_write_gently_r(const int fd_out, const char *data, size_t size, char *restrict buf) We can check if 'buf + 4 == data', and if it is, we can skip memcpy() as an optimization. This is something for the future, but not very important for having this patch series accepted. > + > void packet_buf_write(struct strbuf *buf, const char *fmt, ...) > { > va_list args; > Best, -- Jakub Narębski