Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams
On Tue, Sep 27, 2016 at 02:10:50PM +0200, Lars Schneider wrote: > > That being said, why don't you just use LARGE_PACKET_MAX here? It is > > already the accepted size for feeding to packet_read(), and we know it > > has enough space to hold a NUL terminator. Yes, we may over-allocate by > > 4 bytes, but that isn't really relevant. Strbufs over-allocate anyway. > > TBH in that case I would prefer the "PKTLINE_DATA_MAXLEN+1" solution with > an additional comment explaining "+1". > > Would that be OK for you? > > I am not worried about the extra 4 bytes. I am worried that we make it harder > to see what is going on if we use LARGE_PACKET_MAX. I guess I don't feel to strongly either way. My interest in LARGE_PACKET_MAX is mostly that this is how all the rest of the packet_read() callers behave. -Peff
Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams
> On 27 Sep 2016, at 11:00, Jeff King wrote: > > On Tue, Sep 27, 2016 at 10:14:16AM +0200, Lars Schneider wrote: > > + strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1); > + paket_len = packet_read(fd_in, NULL, NULL, > + sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, > options); >> [...] >> After looking at it with fresh eyes I think the existing code is probably >> correct, >> but maybe a bit confusing. >> >> packet_read() adds a '\0' at the end of the destination buffer: >> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206 >> >> That is why the destination buffer needs to be one byte larger than the >> expected content. >> >> However, in this particular case that wouldn't be necessary because the >> destination >> buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But >> we are not >> supposed to write to this extra byte: >> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31 > > Right. The allocation happens as part of strbuf_grow(), but whatever > fills the buffer is expected to write the actual NUL (either manually, > or by calling strbuf_setlen(). > > I see the bit you quoted warns not to touch the extra byte yourself, > though I wonder if that is a bit heavy-handed (I guess it would matter > if we moved the extra 1-byte growth into strbuf_setlen(), but I find > that a rather unlikely change). > > That being said, why don't you just use LARGE_PACKET_MAX here? It is > already the accepted size for feeding to packet_read(), and we know it > has enough space to hold a NUL terminator. Yes, we may over-allocate by > 4 bytes, but that isn't really relevant. Strbufs over-allocate anyway. TBH in that case I would prefer the "PKTLINE_DATA_MAXLEN+1" solution with an additional comment explaining "+1". Would that be OK for you? I am not worried about the extra 4 bytes. I am worried that we make it harder to see what is going on if we use LARGE_PACKET_MAX. > So just: > > for (;;) { >strbuf_grow(sb_out, LARGE_PACKET_MAX); >packet_len = packet_read(fd_in, NULL, NULL, >sb_out->buf + sb_out->len, LARGE_PACKET_MAX, >options); >if (packet_len <= 0) >break; >/* > * no need for strbuf_setlen() here; packet_read always adds a > * NUL terminator. > */ >sb_out->len += packet_len; > } > > You _could_ make the final line of the loop use strbuf_setlen(); it > would just overwrite something we already know is a NUL (and we know > that no extra allocation is necessary). > > Also, using LARGE_PACKET_MAX fixes the fact that this patch is using > PKTLINE_DATA_MAXLEN before it is actually defined. :) Yeah, I noticed that too and fixed it in v9 :-) Thanks for the reminder! > You might want to occasionally run: > > git rebase -x make > > to make sure all of your incremental steps are valid (or even "make > test" if you are extremely patient; I often do that once after a big > round of complex interactive-rebase reordering). That is a good suggestion. I'll add that to my "tool-belt" :-) Thanks, Lars
Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams
On Tue, Sep 27, 2016 at 10:14:16AM +0200, Lars Schneider wrote: > >>> + strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1); > >>> + paket_len = packet_read(fd_in, NULL, NULL, > >>> + sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, > >>> options); > [...] > After looking at it with fresh eyes I think the existing code is probably > correct, > but maybe a bit confusing. > > packet_read() adds a '\0' at the end of the destination buffer: > https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206 > > That is why the destination buffer needs to be one byte larger than the > expected content. > > However, in this particular case that wouldn't be necessary because the > destination > buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But we > are not > supposed to write to this extra byte: > https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31 Right. The allocation happens as part of strbuf_grow(), but whatever fills the buffer is expected to write the actual NUL (either manually, or by calling strbuf_setlen(). I see the bit you quoted warns not to touch the extra byte yourself, though I wonder if that is a bit heavy-handed (I guess it would matter if we moved the extra 1-byte growth into strbuf_setlen(), but I find that a rather unlikely change). That being said, why don't you just use LARGE_PACKET_MAX here? It is already the accepted size for feeding to packet_read(), and we know it has enough space to hold a NUL terminator. Yes, we may over-allocate by 4 bytes, but that isn't really relevant. Strbufs over-allocate anyway. So just: for (;;) { strbuf_grow(sb_out, LARGE_PACKET_MAX); packet_len = packet_read(fd_in, NULL, NULL, sb_out->buf + sb_out->len, LARGE_PACKET_MAX, options); if (packet_len <= 0) break; /* * no need for strbuf_setlen() here; packet_read always adds a * NUL terminator. */ sb_out->len += packet_len; } You _could_ make the final line of the loop use strbuf_setlen(); it would just overwrite something we already know is a NUL (and we know that no extra allocation is necessary). Also, using LARGE_PACKET_MAX fixes the fact that this patch is using PKTLINE_DATA_MAXLEN before it is actually defined. :) You might want to occasionally run: git rebase -x make to make sure all of your incremental steps are valid (or even "make test" if you are extremely patient; I often do that once after a big round of complex interactive-rebase reordering). > I see two options: > > > (1) I leave the +1 as is and add a comment why the extra byte is necessary. > > Pro: No change in existing code necessary > Con: The destination buffer has two '\0' at the end. > > > (2) I add an option PACKET_READ_DISABLE_NUL_TERMINATION. If the option is > set then no '\0' byte is added to the end. > > Pro: Correct solution, no byte wasted. > Con: Change in existing code required. > > > Any preference? Of the two, I prefer (1), though I like what I suggested above even more (big surprise, I know). -Peff
Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams
On 26 Sep 2016, at 22:23, Lars Schneider wrote: > > On 25 Sep 2016, at 15:46, Jakub Narębski wrote: > >> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze: >>> From: Lars Schneider > > >>> + strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1); >>> + paket_len = packet_read(fd_in, NULL, NULL, >>> + sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, >>> options); >> >> A question (which perhaps was answered during the development of this >> patch series): why is this +1 in PKTLINE_DATA_MAXLEN+1 here? > > Nice catch. I think this is wrong: > https://github.com/git/git/blob/6fe1b1407ed91823daa5d487abe457ff37463349/pkt-line.c#L196 > > It should be "if (len > size)" ... then we don't need the "+1" here. > (but I need to think a bit more about this) After looking at it with fresh eyes I think the existing code is probably correct, but maybe a bit confusing. packet_read() adds a '\0' at the end of the destination buffer: https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206 That is why the destination buffer needs to be one byte larger than the expected content. However, in this particular case that wouldn't be necessary because the destination buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But we are not supposed to write to this extra byte: https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31 I see two options: (1) I leave the +1 as is and add a comment why the extra byte is necessary. Pro: No change in existing code necessary Con: The destination buffer has two '\0' at the end. (2) I add an option PACKET_READ_DISABLE_NUL_TERMINATION. If the option is set then no '\0' byte is added to the end. Pro: Correct solution, no byte wasted. Con: Change in existing code required. Any preference? Thanks, Lars
Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams
On 25 Sep 2016, at 15:46, Jakub Narębski wrote: > W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze: >> From: Lars Schneider >> +{ >> +static char buf[PKTLINE_DATA_MAXLEN]; > > Sidenote: we have LARGE_PACKET_MAX (used in previous patch), but > PKTLINE_DATA_MAXLEN not LARGE_PACKET_DATA_MAX. Agreed, I will rename it. > >> +int err = 0; >> +ssize_t bytes_to_write; >> + >> +while (!err) { >> +bytes_to_write = xread(fd_in, buf, sizeof(buf)); >> +if (bytes_to_write < 0) >> +return COPY_READ_ERROR; >> +if (bytes_to_write == 0) >> +break; >> +err = packet_write_gently(fd_out, buf, bytes_to_write); >> +} >> +if (!err) >> +err = packet_flush_gently(fd_out); >> +return err; >> +} > > Looks good: clean and readable. > > Sidenote (probably outside of scope of this patch): what are the > errors that we can get from this function, beside COPY_READ_ERROR > of course? Everything that is returned by "read()" >> + >> static int get_packet_data(int fd, char **src_buf, size_t *src_size, >> void *dst, unsigned size, int options) >> { >> @@ -305,3 +346,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, >> int *dst_len) >> { >> return packet_read_line_generic(-1, src, src_len, dst_len); >> } >> + >> +ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out) > > It's a bit strange that the signature of write_packetized_from_buf() is > that different from read_packetized_to_buf(). This includes the return > value: int vs ssize_t. As I have checked, write() and read() both > use ssize_t, while fread() and fwrite() both use size_t. read_packetized_to_buf() returns the number of bytes read or a negative error code. write_packetized_from_buf() returns 0 if the call was successful and an error code if not. That's the reason these two functions have a different signature > Perhaps this function should be named read_packetized_to_strbuf() > (err, I asked this already)? I agree with the rename as makes it distinct from write_packetized_from_buf(). >> +{ >> +int paket_len; > > Possible typo: shouldn't it be called packet_len? True! > Shouldn't it be initialized to 0? Well, it is set for sure later. That's why I think it is not necessary. Plus, Eric Wong thought me not to: "Compilers complain about uninitialized variables." http://public-inbox.org/git/20160725072745.GB11634@starla/ (Note: he was talking about pointers there :-) >> +int options = PACKET_READ_GENTLE_ON_EOF; > > Why is this even a local variable? It is never changed, and it is > used only in one place; we can inline it. Removed. >> + >> +size_t oldlen = sb_out->len; >> +size_t oldalloc = sb_out->alloc; > > Just a nitpick (feel free to ignore): doesn't this looks better: > > +size_t old_len = sb_out->len; > +size_t old_alloc = sb_out->alloc; > > Also perhaps s/old_/orig_/g. Agreed. That matches the other variables better. >> +strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1); >> +paket_len = packet_read(fd_in, NULL, NULL, >> +sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, >> options); > > A question (which perhaps was answered during the development of this > patch series): why is this +1 in PKTLINE_DATA_MAXLEN+1 here? Nice catch. I think this is wrong: https://github.com/git/git/blob/6fe1b1407ed91823daa5d487abe457ff37463349/pkt-line.c#L196 It should be "if (len > size)" ... then we don't need the "+1" here. (but I need to think a bit more about this) > >> +if (paket_len <= 0) >> +break; >> +sb_out->len += paket_len; >> +} >> + >> +if (paket_len < 0) { >> +if (oldalloc == 0) >> +strbuf_release(sb_out); >> +else >> +strbuf_setlen(sb_out, oldlen); > > A question (maybe I don't understand strbufs): why there is a special > case for oldalloc == 0? I tried to mimic the behavior of strbuf_read() [1]. The error handling was introduced in 2fc647 [2] to ease error handling: "This allows for easier error handling, as callers only need to call strbuf_release() if A) the command succeeded or B) if they would have had to do so anyway because they added something to the strbuf themselves." [1] https://github.com/git/git/blob/cda1bbd474805e653dda8a71d4ea3790e2a66cbb/strbuf.c#L377-L383 [2] https://github.com/git/git/commit/2fc647004ac7016128372a85db8245581e493812 Thanks, Lars
Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams
W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze: > From: Lars Schneider > > write_packetized_from_fd() and write_packetized_from_buf() write a > stream of packets. All content packets use the maximal packet size > except for the last one. After the last content packet a `flush` control > packet is written. > > read_packetized_to_buf() reads arbitrary sized packets until it detects > a `flush` packet. I guess that read_packetized_to_fd(), for completeness, is not needed for the filter protocol (though it might be useful for the receive side of send-pack / receive-pack). Also, should it be read_packetized_to_strbuf()? I guess using strbuf to read is here because we might need more size to read in full, isn't it. > > Signed-off-by: Lars Schneider > --- > pkt-line.c | 68 > ++ > pkt-line.h | 7 +++ > 2 files changed, 75 insertions(+) > > diff --git a/pkt-line.c b/pkt-line.c > index fc0ac12..a0a8543 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -196,6 +196,47 @@ void packet_buf_write(struct strbuf *buf, const char > *fmt, ...) > va_end(args); > } > > +int write_packetized_from_fd(int fd_in, int fd_out) I wonder if it would be worth it to name parameters in such way that it is known from the name which one is to be packetized, for example fd_out_pkt here... But it might be not worth it; you can get it from the function name. > +{ > + static char buf[PKTLINE_DATA_MAXLEN]; Static buffer means not thread-safe and not reentrant. It would be nice to have this information in a comment for this function, but it is not necessary. Also, is using static variable better than using global variable `packet_buffer`? Well, scope for weird interactions is smaller... Sidenote: we have LARGE_PACKET_MAX (used in previous patch), but PKTLINE_DATA_MAXLEN not LARGE_PACKET_DATA_MAX. > + int err = 0; > + ssize_t bytes_to_write; > + > + while (!err) { > + bytes_to_write = xread(fd_in, buf, sizeof(buf)); > + if (bytes_to_write < 0) > + return COPY_READ_ERROR; > + if (bytes_to_write == 0) > + break; > + err = packet_write_gently(fd_out, buf, bytes_to_write); > + } > + if (!err) > + err = packet_flush_gently(fd_out); > + return err; > +} Looks good: clean and readable. Sidenote (probably outside of scope of this patch): what are the errors that we can get from this function, beside COPY_READ_ERROR of course? > + > +int write_packetized_from_buf(const char *src_in, size_t len, int fd_out) > +{ > + static char buf[PKTLINE_DATA_MAXLEN]; > + int err = 0; > + size_t bytes_written = 0; > + size_t bytes_to_write; Those two variables, instead of modifying the values of len and/or src_in, make code very easy to read. > + > + while (!err) { > + if ((len - bytes_written) > sizeof(buf)) > + bytes_to_write = sizeof(buf); > + else > + bytes_to_write = len - bytes_written; > + if (bytes_to_write == 0) > + break; > + err = packet_write_gently(fd_out, src_in + bytes_written, > bytes_to_write); > + bytes_written += bytes_to_write; > + } > + if (!err) > + err = packet_flush_gently(fd_out); > + return err; > +} Looks good: clean and readable. > + > static int get_packet_data(int fd, char **src_buf, size_t *src_size, > void *dst, unsigned size, int options) > { > @@ -305,3 +346,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, > int *dst_len) > { > return packet_read_line_generic(-1, src, src_len, dst_len); > } > + > +ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out) It's a bit strange that the signature of write_packetized_from_buf() is that different from read_packetized_to_buf(). This includes the return value: int vs ssize_t. As I have checked, write() and read() both use ssize_t, while fread() and fwrite() both use size_t. Perhaps this function should be named read_packetized_to_strbuf() (err, I asked this already)? > +{ > + int paket_len; Possible typo: shouldn't it be called packet_len? Shouldn't it be initialized to 0? + int packet_len = 0; > + int options = PACKET_READ_GENTLE_ON_EOF; Why is this even a local variable? It is never changed, and it is used only in one place; we can inline it. If it would be needed in subsequent patches, then such information should be included in the commit message. > + > + size_t oldlen = sb_out->len; > + size_t oldalloc = sb_out->alloc; Just a nitpick (feel free to ignore): doesn't this looks better: + size_t old_len = sb_out->len; + size_t old_alloc = sb_out->alloc; Also perhaps s/old_/orig_/g. > + > + for (;;) { I see that you used the more popular way of coding forever loop: $ git grep 'for (;;
[PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams
From: Lars Schneider write_packetized_from_fd() and write_packetized_from_buf() write a stream of packets. All content packets use the maximal packet size except for the last one. After the last content packet a `flush` control packet is written. read_packetized_to_buf() reads arbitrary sized packets until it detects a `flush` packet. Signed-off-by: Lars Schneider --- pkt-line.c | 68 ++ pkt-line.h | 7 +++ 2 files changed, 75 insertions(+) diff --git a/pkt-line.c b/pkt-line.c index fc0ac12..a0a8543 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -196,6 +196,47 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...) va_end(args); } +int write_packetized_from_fd(int fd_in, int fd_out) +{ + static char buf[PKTLINE_DATA_MAXLEN]; + int err = 0; + ssize_t bytes_to_write; + + while (!err) { + bytes_to_write = xread(fd_in, buf, sizeof(buf)); + if (bytes_to_write < 0) + return COPY_READ_ERROR; + if (bytes_to_write == 0) + break; + err = packet_write_gently(fd_out, buf, bytes_to_write); + } + if (!err) + err = packet_flush_gently(fd_out); + return err; +} + +int write_packetized_from_buf(const char *src_in, size_t len, int fd_out) +{ + static char buf[PKTLINE_DATA_MAXLEN]; + int err = 0; + size_t bytes_written = 0; + size_t bytes_to_write; + + while (!err) { + if ((len - bytes_written) > sizeof(buf)) + bytes_to_write = sizeof(buf); + else + bytes_to_write = len - bytes_written; + if (bytes_to_write == 0) + break; + err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write); + bytes_written += bytes_to_write; + } + if (!err) + err = packet_flush_gently(fd_out); + return err; +} + static int get_packet_data(int fd, char **src_buf, size_t *src_size, void *dst, unsigned size, int options) { @@ -305,3 +346,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len) { return packet_read_line_generic(-1, src, src_len, dst_len); } + +ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out) +{ + int paket_len; + int options = PACKET_READ_GENTLE_ON_EOF; + + size_t oldlen = sb_out->len; + size_t oldalloc = sb_out->alloc; + + for (;;) { + strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1); + paket_len = packet_read(fd_in, NULL, NULL, + sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, options); + if (paket_len <= 0) + break; + sb_out->len += paket_len; + } + + if (paket_len < 0) { + if (oldalloc == 0) + strbuf_release(sb_out); + else + strbuf_setlen(sb_out, oldlen); + return paket_len; + } + return sb_out->len - oldlen; +} diff --git a/pkt-line.h b/pkt-line.h index 3fa0899..6df8449 100644 --- a/pkt-line.h +++ b/pkt-line.h @@ -25,6 +25,8 @@ void packet_buf_flush(struct strbuf *buf); void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3))); int packet_flush_gently(int fd); int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3))); +int write_packetized_from_fd(int fd_in, int fd_out); +int write_packetized_from_buf(const char *src_in, size_t len, int fd_out); /* * Read a packetized line into the buffer, which must be at least size bytes @@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size); */ char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size); +/* + * Reads a stream of variable sized packets until a flush packet is detected. + */ +ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out); + #define DEFAULT_PACKET_MAX 1000 #define LARGE_PACKET_MAX 65520 extern char packet_buffer[LARGE_PACKET_MAX]; -- 2.10.0