Re: [PATCH v2] fetch-pack: optionally save packs to disk
Am 11.06.2015 um 20:59 schrieb Augie Fackler: When developing server software, it's often helpful to save a potentially-bogus pack for later analysis. This makes that trivial, instead of painful. When you develop server software, shouldn't you test drive the server via the bare metal protocol anyway? That *is* painful, but unavoidable because you must harden the server against any garbage that a potentially malicous client could throw at it. Restricting yourself to a well-behaved client such as fetch-pack is only half the deal. That said, I do think that fetch-pack could learn a mode that makes it easier to debug the normal behavior of a server (if such a mode is missing currently). What is the problem with the current fetch-pack implementation? Does it remove a bogus packfile after download? Does it abort during download when it detects a broken packfile? Does --keep not do what you need? Instead of your approach (which forks off tee to dump a copy of the packfile), would it not be simpler to add an option --debug-pack (probably not the best name) that skips the cleanup step when a broken packfile is detected and prints the name of the downloaded packfile? diff --git a/fetch-pack.c b/fetch-pack.c index a912935..fe6ba58 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args, const char *argv[22]; char keep_arg[256]; char hdr_arg[256]; - const char **av, *cmd_name; + const char **av, *cmd_name, *savepath; int do_keep = args-keep_pack; struct child_process cmd = CHILD_PROCESS_INIT; int ret; @@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args, cmd.argv = argv; av = argv; *hdr_arg = 0; + struct pack_header header; if (!args-keep_pack unpack_limit) { - struct pack_header header; - if (read_pack_header(demux.out, header)) die(protocol error: bad pack header); snprintf(hdr_arg, sizeof(hdr_arg), @@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args, *av++ = --strict; *av++ = NULL; - cmd.in = demux.out; + savepath = getenv(GIT_SAVE_FETCHED_PACK_TO); + if (savepath) { + struct child_process cmd2 = CHILD_PROCESS_INIT; + const char *argv2[22]; + int pipefds[2]; + int e; + const char **av2; + cmd2.argv = argv2; + av2 = argv2; + *av2++ = tee; + if (*hdr_arg) { + /* hdr_arg being nonempty means we already read the +* pack header from demux, so we need to drop a pack +* header in place for tee to append to, otherwise +* we'll end up with a broken pack on disk. +*/ /* * Write multi-line comments * like this (/* on its own line) */ + int fp; + struct sha1file *s; + fp = open(savepath, O_CREAT | O_TRUNC | O_WRONLY, 0666); + s = sha1fd_throughput(fp, savepath, NULL); + sha1write(s, header, sizeof(header)); + sha1flush(s); Are you abusing sha1write() and sha1flush() to write a byte sequence to a file? Is write_in_full() not sufficient? + close(fp); + /* -a is supported by both GNU and BSD tee */ + *av2++ = -a; + } + *av2++ = savepath; + *av2++ = NULL; + cmd2.in = demux.out; + e = pipe(pipefds); + if (e != 0) + die(couldn't make pipe to save pack); start_command() can create the pipe for you. Just say cmd2.out = -1. + cmd2.out = pipefds[1]; + cmd.in = pipefds[0]; + if (start_command(cmd2)) + die(couldn't start tee to save a pack); When you call start_command(), you must also call finish_command(). start_command() prints an error message for you; you don't have to do that (the start_command() in the context below is a bad example). + } else + cmd.in = demux.out; cmd.git_cmd = 1; if (start_command(cmd)) die(fetch-pack: unable to fork off %s, cmd_name); diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 58207d8..bf4640d 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -82,11 +82,23 @@ test_expect_success 'fetch changes via http' ' test_cmp file clone/file ' +test_expect_success 'fetch changes via http and save pack' ' + echo content file + git commit -a -m two
Re: [PATCH v2] fetch-pack: optionally save packs to disk
On Fri, Jun 12, 2015 at 02:00:05PM -0400, Jeff King wrote: When I added GIT_TRACE_PACKET long ago, I had always intended to follow-up with a GIT_TRACE_PACKFILE. The former stops tracing when we get to the binary data, but I had intended the latter to store the pure on-the-wire packfile transmission for later debugging to. I never got around to it, but I think it is something like the patch below. Here it is, a bit more cleaned up. I think this is a nice improvement, and it does fix some minor issues in the existing GIT_TRACE_PACKET code. But I don't think it will ever work for receive-pack, because we hand off the socket directly to index-pack. I can live with that. But another approach would be to make it easier to capture the stdin of programs with GIT_TRACE, rather than just their arguments. That makes it more generically usable (e.g., I have hacky patches on our servers for capturing pack-objects input so I can replay expensive fetch requests). As Augie noted, that's not a full pack-file due to the --pack-header stuff. But given a full trace (which has both the arguments and stdin), you could reconstruct the packfile, which we could do as a post-processing step. I dunno. I prefer introducing a nicely generic mechanism, but it would probably be a little more irritating to use in practice. I guess yet another way to do it would be to put the GIT_TRACE_PACK intelligence into index-pack and unpack-objects (rather than fetch-pack). Then it at least applies to both the pushing and fetching sides. Anyway, here are the patches I came up with: [1/3]: pkt-line: simplify starts_with checks in packet tracing [2/3]: pkt-line: tighten sideband PACK check when tracing [3/3]: pkt-line: support tracing verbatim pack contents -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fetch-pack: optionally save packs to disk
Johannes Sixt j...@kdbg.org writes: What is the problem with the current fetch-pack implementation? Does it remove a bogus packfile after download? Does it abort during download when it detects a broken packfile? Does --keep not do what you need? Doesn't the incoming data still go through the fattening process, though? You will not be able to inspect the byte-for-byte identical stream that came out of the server end whose packfile generation logic is suspect. For the purpose of debugging your own new server implementation, it might be a better approach to capture the pack as it comes out at the server end, instead of doing it at the fetch-pack end as it comes in. But the approach to add this dump at the fetch-pack side is that it gives us a tool to diagnose problems that come from broken server (re)implementations by other people we cannot debug, i.e. you are spewing this corrupt pack against this request; here is a dump we took to help you go fix your server. Instead of your approach (which forks off tee to dump a copy of the packfile), would it not be simpler to add an option --debug-pack (probably not the best name) that skips the cleanup step when a broken packfile is detected and prints the name of the downloaded packfile? As long as we need to debug a thin pack that comes from the other end, that approach is not sufficient, I am afraid. I anticipated that you'd have problem with its use of tee. It probably can do this internally with async interface, perhaps, instead? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fetch-pack: optionally save packs to disk
On Fri, Jun 12, 2015 at 11:07 AM, Junio C Hamano gits...@pobox.com wrote: What is the problem with the current fetch-pack implementation? Does it remove a bogus packfile after download? Does it abort during download when it detects a broken packfile? Does --keep not do what you need? Doesn't the incoming data still go through the fattening process, though? You will not be able to inspect the byte-for-byte identical stream that came out of the server end whose packfile generation logic is suspect. For the purpose of debugging your own new server implementation, it might be a better approach to capture the pack as it comes out at the server end, instead of doing it at the fetch-pack end as it comes in. But the approach to add this dump at the fetch-pack side is that it gives us a tool to diagnose problems that come from broken server (re)implementations by other people we cannot debug, i.e. you are spewing this corrupt pack against this request; here is a dump we took to help you go fix your server. *nod* that's an important part of it. Also, in the small-pull case, the pack data gets sent to unpack-objects anyway, so git is never saving the packfile anywhere in that case (I think it's for a pull of less than 100 objects, which characterizes most of my reduced test cases for weirdness.) Instead of your approach (which forks off tee to dump a copy of the packfile), would it not be simpler to add an option --debug-pack (probably not the best name) that skips the cleanup step when a broken packfile is detected and prints the name of the downloaded packfile? As long as we need to debug a thin pack that comes from the other end, that approach is not sufficient, I am afraid. I anticipated that you'd have problem with its use of tee. It probably can do this internally with async interface, perhaps, instead? I'd be happy to make such changes if that's the consensus and someone can give me pointers. My C is admittedly pretty rusty from non-use, and I'm not at all familiar with git's codebase, but I'll at least try. Thanks! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fetch-pack: optionally save packs to disk
On Fri, Jun 12, 2015 at 2:22 AM, Johannes Sixt j...@kdbg.org wrote: Am 11.06.2015 um 20:59 schrieb Augie Fackler: When developing server software, it's often helpful to save a potentially-bogus pack for later analysis. This makes that trivial, instead of painful. When you develop server software, shouldn't you test drive the server via the bare metal protocol anyway? That *is* painful, but unavoidable because you must harden the server against any garbage that a potentially malicous client could throw at it. Restricting yourself to a well-behaved client such as fetch-pack is only half the deal. We do that too, but sometimes I've encountered an edge case that's trivially reproduced from an existing repo, and going through the work to manually drive the server is a monumental pain in the butt, and all I *really* need is to see the bytes sent from the server to the client. If it weren't for SSL-everywhere, I'd probably just do this with wireshark, but that's not the world I live in. That said, I do think that fetch-pack could learn a mode that makes it easier to debug the normal behavior of a server (if such a mode is missing currently). What is the problem with the current fetch-pack implementation? Does it remove a bogus packfile after download? Does it abort during download when it detects a broken packfile? Does --keep not do what you need? fetch-pack doesn't store the pack anywhere - it's sending it to index-pack (or unpack-objects) using --stdin, which means that the raw bytes from the server currently are never materialized anywhere on disk. Having index-pack do this is too late, because it's doing things like rewriting the pack header in a potentially new format. (Junio also covered this well, thanks!) Instead of your approach (which forks off tee to dump a copy of the packfile), would it not be simpler to add an option --debug-pack (probably not the best name) that skips the cleanup step when a broken packfile is detected and prints the name of the downloaded packfile? diff --git a/fetch-pack.c b/fetch-pack.c index a912935..fe6ba58 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args, const char *argv[22]; char keep_arg[256]; char hdr_arg[256]; - const char **av, *cmd_name; + const char **av, *cmd_name, *savepath; int do_keep = args-keep_pack; struct child_process cmd = CHILD_PROCESS_INIT; int ret; @@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args, cmd.argv = argv; av = argv; *hdr_arg = 0; + struct pack_header header; if (!args-keep_pack unpack_limit) { - struct pack_header header; - if (read_pack_header(demux.out, header)) die(protocol error: bad pack header); snprintf(hdr_arg, sizeof(hdr_arg), @@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args, *av++ = --strict; *av++ = NULL; - cmd.in = demux.out; + savepath = getenv(GIT_SAVE_FETCHED_PACK_TO); + if (savepath) { + struct child_process cmd2 = CHILD_PROCESS_INIT; + const char *argv2[22]; + int pipefds[2]; + int e; + const char **av2; + cmd2.argv = argv2; + av2 = argv2; + *av2++ = tee; + if (*hdr_arg) { + /* hdr_arg being nonempty means we already read the +* pack header from demux, so we need to drop a pack +* header in place for tee to append to, otherwise +* we'll end up with a broken pack on disk. +*/ /* * Write multi-line comments * like this (/* on its own line) */ + int fp; + struct sha1file *s; + fp = open(savepath, O_CREAT | O_TRUNC | O_WRONLY, 0666); + s = sha1fd_throughput(fp, savepath, NULL); + sha1write(s, header, sizeof(header)); + sha1flush(s); Are you abusing sha1write() and sha1flush() to write a byte sequence to a file? Is write_in_full() not sufficient? I didn't know about write_in_full. I'm very much *not* familiar with git's codebase - I know the protocols and formats reasonably well, but have needed only occasionally to look at the code. That works, thanks. + close(fp); + /* -a is supported by both GNU and BSD tee */ + *av2++ = -a; + } + *av2++ = savepath; + *av2++ = NULL; + cmd2.in = demux.out; + e = pipe(pipefds); +
Re: [PATCH v2] fetch-pack: optionally save packs to disk
On Fri, Jun 12, 2015 at 08:07:36AM -0700, Junio C Hamano wrote: Johannes Sixt j...@kdbg.org writes: What is the problem with the current fetch-pack implementation? Does it remove a bogus packfile after download? Does it abort during download when it detects a broken packfile? Does --keep not do what you need? Doesn't the incoming data still go through the fattening process, though? You will not be able to inspect the byte-for-byte identical stream that came out of the server end whose packfile generation logic is suspect. For the purpose of debugging your own new server implementation, it might be a better approach to capture the pack as it comes out at the server end, instead of doing it at the fetch-pack end as it comes in. But the approach to add this dump at the fetch-pack side is that it gives us a tool to diagnose problems that come from broken server (re)implementations by other people we cannot debug, i.e. you are spewing this corrupt pack against this request; here is a dump we took to help you go fix your server. When I added GIT_TRACE_PACKET long ago, I had always intended to follow-up with a GIT_TRACE_PACKFILE. The former stops tracing when we get to the binary data, but I had intended the latter to store the pure on-the-wire packfile transmission for later debugging to. I never got around to it, but I think it is something like the patch below. With: GIT_TRACE_PACKET=1 GIT_TRACE_PACK=/tmp/foo.pack git clone ... this yields a usable pack in /tmp/foo.pack (note that it only kicks in when packet-tracing is on at all; if we restructure the code a bit, we can remove that limitation). In theory it would also work when receiving a pack via push, but I think we actually skip the pkt-line protocol there. We'd have to manually check GIT_TRACE_PACK. Also, as a bonus, it means we do not stop tracing completely when we start to receive a sideband pack. The current GIT_TRACE_PACKET code misses any sideband-2 messages that come after we start receiving the pack. diff --git a/pkt-line.c b/pkt-line.c index 187a229..f82871a 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -4,20 +4,39 @@ char packet_buffer[LARGE_PACKET_MAX]; static const char *packet_trace_prefix = git; static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET); +static struct trace_key trace_pack = TRACE_KEY_INIT(PACK); void packet_trace_identity(const char *prog) { packet_trace_prefix = xstrdup(prog); } +static int packet_trace_pack(const char *buf, unsigned int len, int sideband) +{ + if (!sideband) { + trace_verbatim(trace_pack, buf, len); + return 1; + } else if (len *buf == '\1') { + trace_verbatim(trace_pack, buf + 1, len - 1); + return 1; + } else { + /* it's another non-pack sideband */ + return 0; + } +} + static void packet_trace(const char *buf, unsigned int len, int write) { int i; struct strbuf out; + static int in_pack, sideband; if (!trace_want(trace_packet)) return; + if (in_pack packet_trace_pack(buf, len, sideband)) + return; + /* +32 is just a guess for header + quoting */ strbuf_init(out, len+32); @@ -27,7 +46,9 @@ static void packet_trace(const char *buf, unsigned int len, int write) if ((len = 4 starts_with(buf, PACK)) || (len = 5 starts_with(buf+1, PACK))) { strbuf_addstr(out, PACK ...); - trace_disable(trace_packet); + in_pack = 1; + sideband = *buf == '\1'; + packet_trace_pack(buf, len, sideband); } else { /* XXX we should really handle printable utf8 */ diff --git a/trace.c b/trace.c index 3c3bd8f..7393926 100644 --- a/trace.c +++ b/trace.c @@ -120,6 +120,13 @@ static int prepare_trace_line(const char *file, int line, return 1; } +void trace_verbatim(struct trace_key *key, const void *buf, unsigned len) +{ + if (!trace_want(key)) + return; + write_or_whine_pipe(get_trace_fd(key), buf, len, err_msg); +} + static void print_trace_line(struct trace_key *key, struct strbuf *buf) { strbuf_complete_line(buf); diff --git a/trace.h b/trace.h index ae6a332..179b249 100644 --- a/trace.h +++ b/trace.h @@ -18,6 +18,7 @@ extern int trace_want(struct trace_key *key); extern void trace_disable(struct trace_key *key); extern uint64_t getnanotime(void); extern void trace_command_performance(const char **argv); +extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned len); #ifndef HAVE_VARIADIC_MACROS -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] fetch-pack: optionally save packs to disk
When developing server software, it's often helpful to save a potentially-bogus pack for later analysis. This makes that trivial, instead of painful. This is made a little complicated by the fact that in some cases (like cloning from smart-http, but not from a local repo) the fetch code reads the pack header before sending the pack to index-pack (which then gets a --pack_header flag). The included tests cover both of these cases. To use the new feature, set GIT_SAVE_FETCHED_PACK_TO to a file path and git-fetch will do the rest. The resulting pack can be examined with git-index-pack or similar tools (although if it's corrupt, custom tools can be especially helpful.) Signed-off-by: Augie Fackler au...@google.com --- Documentation/git.txt | 6 ++ fetch-pack.c| 44 t/t5551-http-fetch-smart.sh | 12 t/t5601-clone.sh| 9 + 4 files changed, 67 insertions(+), 4 deletions(-) diff --git a/Documentation/git.txt b/Documentation/git.txt index 45b64a7..31bc3b5 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -1060,6 +1060,12 @@ GIT_ICASE_PATHSPECS:: an operation has touched every ref (e.g., because you are cloning a repository to make a backup). +`GIT_SAVE_FETCHED_PACK_TO`:: + If set, save any fetched pack to the path in the + variable. This is mostly useful if you're writing a custom + server and are producing broken packs, as the saved pack won't + be cleaned up even if it's corrupt. + Discussion[[Discussion]] diff --git a/fetch-pack.c b/fetch-pack.c index a912935..fe6ba58 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args, const char *argv[22]; char keep_arg[256]; char hdr_arg[256]; - const char **av, *cmd_name; + const char **av, *cmd_name, *savepath; int do_keep = args-keep_pack; struct child_process cmd = CHILD_PROCESS_INIT; int ret; @@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args, cmd.argv = argv; av = argv; *hdr_arg = 0; + struct pack_header header; if (!args-keep_pack unpack_limit) { - struct pack_header header; - if (read_pack_header(demux.out, header)) die(protocol error: bad pack header); snprintf(hdr_arg, sizeof(hdr_arg), @@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args, *av++ = --strict; *av++ = NULL; - cmd.in = demux.out; + savepath = getenv(GIT_SAVE_FETCHED_PACK_TO); + if (savepath) { + struct child_process cmd2 = CHILD_PROCESS_INIT; + const char *argv2[22]; + int pipefds[2]; + int e; + const char **av2; + cmd2.argv = argv2; + av2 = argv2; + *av2++ = tee; + if (*hdr_arg) { + /* hdr_arg being nonempty means we already read the +* pack header from demux, so we need to drop a pack +* header in place for tee to append to, otherwise +* we'll end up with a broken pack on disk. +*/ + int fp; + struct sha1file *s; + fp = open(savepath, O_CREAT | O_TRUNC | O_WRONLY, 0666); + s = sha1fd_throughput(fp, savepath, NULL); + sha1write(s, header, sizeof(header)); + sha1flush(s); + close(fp); + /* -a is supported by both GNU and BSD tee */ + *av2++ = -a; + } + *av2++ = savepath; + *av2++ = NULL; + cmd2.in = demux.out; + e = pipe(pipefds); + if (e != 0) + die(couldn't make pipe to save pack); + cmd2.out = pipefds[1]; + cmd.in = pipefds[0]; + if (start_command(cmd2)) + die(couldn't start tee to save a pack); + } else + cmd.in = demux.out; cmd.git_cmd = 1; if (start_command(cmd)) die(fetch-pack: unable to fork off %s, cmd_name); diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 58207d8..bf4640d 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -82,11 +82,23 @@ test_expect_success 'fetch changes via http' ' test_cmp file clone/file ' +test_expect_success 'fetch changes via http and save pack' ' + echo content file + git commit -a -m two + git push public + GIT_SAVE_FETCHED_PACK_TO=saved.pack + export GIT_SAVE_FETCHED_PACK_TO + (cd clone git pull) + git