Re: [PATCH v2] fetch-pack: optionally save packs to disk

2015-06-12 Thread Johannes Sixt

Am 11.06.2015 um 20:59 schrieb Augie Fackler:

When developing server software, it's often helpful to save a
potentially-bogus pack for later analysis. This makes that trivial,
instead of painful.


When you develop server software, shouldn't you test drive the server 
via the bare metal protocol anyway? That *is* painful, but unavoidable 
because you must harden the server against any garbage that a 
potentially malicous client could throw at it. Restricting yourself to a 
well-behaved client such as fetch-pack is only half the deal.


That said, I do think that fetch-pack could learn a mode that makes it 
easier to debug the normal behavior of a server (if such a mode is 
missing currently).


What is the problem with the current fetch-pack implementation? Does it 
remove a bogus packfile after download? Does it abort during download 
when it detects a broken packfile? Does --keep not do what you need?


Instead of your approach (which forks off tee to dump a copy of the 
packfile), would it not be simpler to add an option --debug-pack 
(probably not the best name) that skips the cleanup step when a broken 
packfile is detected and prints the name of the downloaded packfile?



diff --git a/fetch-pack.c b/fetch-pack.c
index a912935..fe6ba58 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args,
const char *argv[22];
char keep_arg[256];
char hdr_arg[256];
-   const char **av, *cmd_name;
+   const char **av, *cmd_name, *savepath;
int do_keep = args-keep_pack;
struct child_process cmd = CHILD_PROCESS_INIT;
int ret;
@@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args,
cmd.argv = argv;
av = argv;
*hdr_arg = 0;
+   struct pack_header header;
if (!args-keep_pack  unpack_limit) {
-   struct pack_header header;
-
if (read_pack_header(demux.out, header))
die(protocol error: bad pack header);
snprintf(hdr_arg, sizeof(hdr_arg),
@@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args,
*av++ = --strict;
*av++ = NULL;

-   cmd.in = demux.out;
+   savepath = getenv(GIT_SAVE_FETCHED_PACK_TO);
+   if (savepath) {
+   struct child_process cmd2 = CHILD_PROCESS_INIT;
+   const char *argv2[22];
+   int pipefds[2];
+   int e;
+   const char **av2;
+   cmd2.argv = argv2;
+   av2 = argv2;
+   *av2++ = tee;
+   if (*hdr_arg) {
+   /* hdr_arg being nonempty means we already read the
+* pack header from demux, so we need to drop a pack
+* header in place for tee to append to, otherwise
+* we'll end up with a broken pack on disk.
+*/


/*
 * Write multi-line comments
 * like this (/* on its own line)
 */


+   int fp;
+   struct sha1file *s;
+   fp = open(savepath, O_CREAT | O_TRUNC | O_WRONLY, 0666);
+   s = sha1fd_throughput(fp, savepath, NULL);
+   sha1write(s, header, sizeof(header));
+   sha1flush(s);


Are you abusing sha1write() and sha1flush() to write a byte sequence to 
a file? Is write_in_full() not sufficient?



+   close(fp);
+   /* -a is supported by both GNU and BSD tee */
+   *av2++ = -a;
+   }
+   *av2++ = savepath;
+   *av2++ = NULL;
+   cmd2.in = demux.out;
+   e = pipe(pipefds);
+   if (e != 0)
+   die(couldn't make pipe to save pack);


start_command() can create the pipe for you. Just say cmd2.out = -1.


+   cmd2.out = pipefds[1];
+   cmd.in = pipefds[0];
+   if (start_command(cmd2))
+   die(couldn't start tee to save a pack);


When you call start_command(), you must also call finish_command(). 
start_command() prints an error message for you; you don't have to do 
that (the start_command() in the context below is a bad example).



+   } else
+   cmd.in = demux.out;
cmd.git_cmd = 1;
if (start_command(cmd))
die(fetch-pack: unable to fork off %s, cmd_name);
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 58207d8..bf4640d 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -82,11 +82,23 @@ test_expect_success 'fetch changes via http' '
test_cmp file clone/file
  '

+test_expect_success 'fetch changes via http and save pack' '
+   echo content file 
+   git commit -a -m two 

Re: [PATCH v2] fetch-pack: optionally save packs to disk

2015-06-12 Thread Jeff King
On Fri, Jun 12, 2015 at 02:00:05PM -0400, Jeff King wrote:

 When I added GIT_TRACE_PACKET long ago, I had always intended to
 follow-up with a GIT_TRACE_PACKFILE. The former stops tracing when we
 get to the binary data, but I had intended the latter to store the pure
 on-the-wire packfile transmission for later debugging to. I never got
 around to it, but I think it is something like the patch below.

Here it is, a bit more cleaned up. I think this is a nice improvement,
and it does fix some minor issues in the existing GIT_TRACE_PACKET code.
But I don't think it will ever work for receive-pack, because we hand
off the socket directly to index-pack.

I can live with that. But another approach would be to make it easier to
capture the stdin of programs with GIT_TRACE, rather than just their
arguments. That makes it more generically usable (e.g., I have hacky
patches on our servers for capturing pack-objects input so I can
replay expensive fetch requests). As Augie noted, that's not a full
pack-file due to the --pack-header stuff. But given a full trace (which
has both the arguments and stdin), you could reconstruct the packfile,
which we could do as a post-processing step.

I dunno. I prefer introducing a nicely generic mechanism, but it would
probably be a little more irritating to use in practice.

I guess yet another way to do it would be to put the GIT_TRACE_PACK
intelligence into index-pack and unpack-objects (rather than
fetch-pack). Then it at least applies to both the pushing and fetching
sides.

Anyway, here are the patches I came up with:

  [1/3]: pkt-line: simplify starts_with checks in packet tracing
  [2/3]: pkt-line: tighten sideband PACK check when tracing
  [3/3]: pkt-line: support tracing verbatim pack contents

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fetch-pack: optionally save packs to disk

2015-06-12 Thread Junio C Hamano
Johannes Sixt j...@kdbg.org writes:

 What is the problem with the current fetch-pack implementation? Does
 it remove a bogus packfile after download? Does it abort during
 download when it detects a broken packfile? Does --keep not do what
 you need?

Doesn't the incoming data still go through the fattening process,
though?  You will not be able to inspect the byte-for-byte identical
stream that came out of the server end whose packfile generation
logic is suspect.

For the purpose of debugging your own new server implementation, it
might be a better approach to capture the pack as it comes out at
the server end, instead of doing it at the fetch-pack end as it
comes in. But the approach to add this dump at the fetch-pack side
is that it gives us a tool to diagnose problems that come from
broken server (re)implementations by other people we cannot debug,
i.e. you are spewing this corrupt pack against this request; here
is a dump we took to help you go fix your server.

 Instead of your approach (which forks off tee to dump a copy of the
 packfile), would it not be simpler to add an option --debug-pack
 (probably not the best name) that skips the cleanup step when a broken
 packfile is detected and prints the name of the downloaded packfile?

As long as we need to debug a thin pack that comes from the other
end, that approach is not sufficient, I am afraid.

I anticipated that you'd have problem with its use of tee.  It
probably can do this internally with async interface, perhaps,
instead?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fetch-pack: optionally save packs to disk

2015-06-12 Thread Augie Fackler
On Fri, Jun 12, 2015 at 11:07 AM, Junio C Hamano gits...@pobox.com wrote:
 What is the problem with the current fetch-pack implementation? Does
 it remove a bogus packfile after download? Does it abort during
 download when it detects a broken packfile? Does --keep not do what
 you need?

 Doesn't the incoming data still go through the fattening process,
 though?  You will not be able to inspect the byte-for-byte identical
 stream that came out of the server end whose packfile generation
 logic is suspect.

 For the purpose of debugging your own new server implementation, it
 might be a better approach to capture the pack as it comes out at
 the server end, instead of doing it at the fetch-pack end as it
 comes in. But the approach to add this dump at the fetch-pack side
 is that it gives us a tool to diagnose problems that come from
 broken server (re)implementations by other people we cannot debug,
 i.e. you are spewing this corrupt pack against this request; here
 is a dump we took to help you go fix your server.

*nod* that's an important part of it. Also, in the small-pull case,
the pack data gets sent to unpack-objects anyway, so git is never
saving the packfile anywhere in that case (I think it's for a pull of
less than 100 objects, which characterizes most of my reduced test
cases for weirdness.)


 Instead of your approach (which forks off tee to dump a copy of the
 packfile), would it not be simpler to add an option --debug-pack
 (probably not the best name) that skips the cleanup step when a broken
 packfile is detected and prints the name of the downloaded packfile?

 As long as we need to debug a thin pack that comes from the other
 end, that approach is not sufficient, I am afraid.

 I anticipated that you'd have problem with its use of tee.  It
 probably can do this internally with async interface, perhaps,
 instead?

I'd be happy to make such changes if that's the consensus and someone
can give me pointers. My C is admittedly pretty rusty from non-use,
and I'm not at all familiar with git's codebase, but I'll at least
try.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fetch-pack: optionally save packs to disk

2015-06-12 Thread Augie Fackler
On Fri, Jun 12, 2015 at 2:22 AM, Johannes Sixt j...@kdbg.org wrote:

 Am 11.06.2015 um 20:59 schrieb Augie Fackler:

 When developing server software, it's often helpful to save a
 potentially-bogus pack for later analysis. This makes that trivial,
 instead of painful.


 When you develop server software, shouldn't you test drive the server via
the bare metal protocol anyway? That *is* painful, but unavoidable because
you must harden the server against any garbage that a potentially malicous
client could throw at it. Restricting yourself to a well-behaved client
such as fetch-pack is only half the deal.


We do that too, but sometimes I've encountered an edge case that's
trivially reproduced from an existing repo, and going through the work to
manually drive the server is a monumental pain in the butt, and all I
*really* need is to see the bytes sent from the server to the client. If it
weren't for SSL-everywhere, I'd probably just do this with wireshark, but
that's not the world I live in.



 That said, I do think that fetch-pack could learn a mode that makes it
easier to debug the normal behavior of a server (if such a mode is missing
currently).

 What is the problem with the current fetch-pack implementation? Does it
remove a bogus packfile after download? Does it abort during download when
it detects a broken packfile? Does --keep not do what you need?


fetch-pack doesn't store the pack anywhere - it's sending it to index-pack
(or unpack-objects) using --stdin, which means that the raw bytes from the
server currently are never materialized anywhere on disk. Having index-pack
do this is too late, because it's doing things like rewriting the pack
header in a potentially new format.

(Junio also covered this well, thanks!)



 Instead of your approach (which forks off tee to dump a copy of the
packfile), would it not be simpler to add an option --debug-pack (probably
not the best name) that skips the cleanup step when a broken packfile is
detected and prints the name of the downloaded packfile?


 diff --git a/fetch-pack.c b/fetch-pack.c
 index a912935..fe6ba58 100644
 --- a/fetch-pack.c
 +++ b/fetch-pack.c
 @@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args,
 const char *argv[22];
 char keep_arg[256];
 char hdr_arg[256];
 -   const char **av, *cmd_name;
 +   const char **av, *cmd_name, *savepath;
 int do_keep = args-keep_pack;
 struct child_process cmd = CHILD_PROCESS_INIT;
 int ret;
 @@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args,
 cmd.argv = argv;
 av = argv;
 *hdr_arg = 0;
 +   struct pack_header header;
 if (!args-keep_pack  unpack_limit) {
 -   struct pack_header header;
 -
 if (read_pack_header(demux.out, header))
 die(protocol error: bad pack header);
 snprintf(hdr_arg, sizeof(hdr_arg),
 @@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args,
 *av++ = --strict;
 *av++ = NULL;

 -   cmd.in = demux.out;
 +   savepath = getenv(GIT_SAVE_FETCHED_PACK_TO);
 +   if (savepath) {
 +   struct child_process cmd2 = CHILD_PROCESS_INIT;
 +   const char *argv2[22];
 +   int pipefds[2];
 +   int e;
 +   const char **av2;
 +   cmd2.argv = argv2;
 +   av2 = argv2;
 +   *av2++ = tee;
 +   if (*hdr_arg) {
 +   /* hdr_arg being nonempty means we already read
the
 +* pack header from demux, so we need to drop a
pack
 +* header in place for tee to append to,
otherwise
 +* we'll end up with a broken pack on disk.
 +*/


 /*
  * Write multi-line comments
  * like this (/* on its own line)
  */

 +   int fp;
 +   struct sha1file *s;
 +   fp = open(savepath, O_CREAT | O_TRUNC |
O_WRONLY, 0666);
 +   s = sha1fd_throughput(fp, savepath, NULL);
 +   sha1write(s, header, sizeof(header));
 +   sha1flush(s);


 Are you abusing sha1write() and sha1flush() to write a byte sequence to a
file? Is write_in_full() not sufficient?


I didn't know about write_in_full. I'm very much *not* familiar with git's
codebase - I know the protocols and formats reasonably well, but have
needed only occasionally to look at the code. That works, thanks.




 +   close(fp);
 +   /* -a is supported by both GNU and BSD tee */
 +   *av2++ = -a;
 +   }
 +   *av2++ = savepath;
 +   *av2++ = NULL;
 +   cmd2.in = demux.out;
 +   e = pipe(pipefds);
 +  

Re: [PATCH v2] fetch-pack: optionally save packs to disk

2015-06-12 Thread Jeff King
On Fri, Jun 12, 2015 at 08:07:36AM -0700, Junio C Hamano wrote:

 Johannes Sixt j...@kdbg.org writes:
 
  What is the problem with the current fetch-pack implementation? Does
  it remove a bogus packfile after download? Does it abort during
  download when it detects a broken packfile? Does --keep not do what
  you need?
 
 Doesn't the incoming data still go through the fattening process,
 though?  You will not be able to inspect the byte-for-byte identical
 stream that came out of the server end whose packfile generation
 logic is suspect.
 
 For the purpose of debugging your own new server implementation, it
 might be a better approach to capture the pack as it comes out at
 the server end, instead of doing it at the fetch-pack end as it
 comes in. But the approach to add this dump at the fetch-pack side
 is that it gives us a tool to diagnose problems that come from
 broken server (re)implementations by other people we cannot debug,
 i.e. you are spewing this corrupt pack against this request; here
 is a dump we took to help you go fix your server.

When I added GIT_TRACE_PACKET long ago, I had always intended to
follow-up with a GIT_TRACE_PACKFILE. The former stops tracing when we
get to the binary data, but I had intended the latter to store the pure
on-the-wire packfile transmission for later debugging to. I never got
around to it, but I think it is something like the patch below.

With:

  GIT_TRACE_PACKET=1 GIT_TRACE_PACK=/tmp/foo.pack git clone ...

this yields a usable pack in /tmp/foo.pack (note that it only kicks in
when packet-tracing is on at all; if we restructure the code a bit, we
can remove that limitation).

In theory it would also work when receiving a pack via push, but I
think we actually skip the pkt-line protocol there. We'd have to
manually check GIT_TRACE_PACK.

Also, as a bonus, it means we do not stop tracing completely when we
start to receive a sideband pack. The current GIT_TRACE_PACKET code
misses any sideband-2 messages that come after we start receiving the
pack.

diff --git a/pkt-line.c b/pkt-line.c
index 187a229..f82871a 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -4,20 +4,39 @@
 char packet_buffer[LARGE_PACKET_MAX];
 static const char *packet_trace_prefix = git;
 static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET);
+static struct trace_key trace_pack = TRACE_KEY_INIT(PACK);
 
 void packet_trace_identity(const char *prog)
 {
packet_trace_prefix = xstrdup(prog);
 }
 
+static int packet_trace_pack(const char *buf, unsigned int len, int sideband)
+{
+   if (!sideband) {
+   trace_verbatim(trace_pack, buf, len);
+   return 1;
+   } else if (len  *buf == '\1') {
+   trace_verbatim(trace_pack, buf + 1, len - 1);
+   return 1;
+   } else {
+   /* it's another non-pack sideband */
+   return 0;
+   }
+}
+
 static void packet_trace(const char *buf, unsigned int len, int write)
 {
int i;
struct strbuf out;
+   static int in_pack, sideband;
 
if (!trace_want(trace_packet))
return;
 
+   if (in_pack  packet_trace_pack(buf, len, sideband))
+   return;
+
/* +32 is just a guess for header + quoting */
strbuf_init(out, len+32);
 
@@ -27,7 +46,9 @@ static void packet_trace(const char *buf, unsigned int len, 
int write)
if ((len = 4  starts_with(buf, PACK)) ||
(len = 5  starts_with(buf+1, PACK))) {
strbuf_addstr(out, PACK ...);
-   trace_disable(trace_packet);
+   in_pack = 1;
+   sideband = *buf == '\1';
+   packet_trace_pack(buf, len, sideband);
}
else {
/* XXX we should really handle printable utf8 */
diff --git a/trace.c b/trace.c
index 3c3bd8f..7393926 100644
--- a/trace.c
+++ b/trace.c
@@ -120,6 +120,13 @@ static int prepare_trace_line(const char *file, int line,
return 1;
 }
 
+void trace_verbatim(struct trace_key *key, const void *buf, unsigned len)
+{
+   if (!trace_want(key))
+   return;
+   write_or_whine_pipe(get_trace_fd(key), buf, len, err_msg);
+}
+
 static void print_trace_line(struct trace_key *key, struct strbuf *buf)
 {
strbuf_complete_line(buf);
diff --git a/trace.h b/trace.h
index ae6a332..179b249 100644
--- a/trace.h
+++ b/trace.h
@@ -18,6 +18,7 @@ extern int trace_want(struct trace_key *key);
 extern void trace_disable(struct trace_key *key);
 extern uint64_t getnanotime(void);
 extern void trace_command_performance(const char **argv);
+extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned 
len);
 
 #ifndef HAVE_VARIADIC_MACROS
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] fetch-pack: optionally save packs to disk

2015-06-11 Thread Augie Fackler
When developing server software, it's often helpful to save a
potentially-bogus pack for later analysis. This makes that trivial,
instead of painful. This is made a little complicated by the fact that
in some cases (like cloning from smart-http, but not from a local repo)
the fetch code reads the pack header before sending the pack to
index-pack (which then gets a --pack_header flag). The included tests
cover both of these cases.

To use the new feature, set GIT_SAVE_FETCHED_PACK_TO to a file path and
git-fetch will do the rest. The resulting pack can be examined with
git-index-pack or similar tools (although if it's corrupt, custom tools
can be especially helpful.)

Signed-off-by: Augie Fackler au...@google.com
---
 Documentation/git.txt   |  6 ++
 fetch-pack.c| 44 
 t/t5551-http-fetch-smart.sh | 12 
 t/t5601-clone.sh|  9 +
 4 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 45b64a7..31bc3b5 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -1060,6 +1060,12 @@ GIT_ICASE_PATHSPECS::
an operation has touched every ref (e.g., because you are
cloning a repository to make a backup).
 
+`GIT_SAVE_FETCHED_PACK_TO`::
+   If set, save any fetched pack to the path in the
+   variable. This is mostly useful if you're writing a custom
+   server and are producing broken packs, as the saved pack won't
+   be cleaned up even if it's corrupt.
+
 
 Discussion[[Discussion]]
 
diff --git a/fetch-pack.c b/fetch-pack.c
index a912935..fe6ba58 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args,
const char *argv[22];
char keep_arg[256];
char hdr_arg[256];
-   const char **av, *cmd_name;
+   const char **av, *cmd_name, *savepath;
int do_keep = args-keep_pack;
struct child_process cmd = CHILD_PROCESS_INIT;
int ret;
@@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args,
cmd.argv = argv;
av = argv;
*hdr_arg = 0;
+   struct pack_header header;
if (!args-keep_pack  unpack_limit) {
-   struct pack_header header;
-
if (read_pack_header(demux.out, header))
die(protocol error: bad pack header);
snprintf(hdr_arg, sizeof(hdr_arg),
@@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args,
*av++ = --strict;
*av++ = NULL;
 
-   cmd.in = demux.out;
+   savepath = getenv(GIT_SAVE_FETCHED_PACK_TO);
+   if (savepath) {
+   struct child_process cmd2 = CHILD_PROCESS_INIT;
+   const char *argv2[22];
+   int pipefds[2];
+   int e;
+   const char **av2;
+   cmd2.argv = argv2;
+   av2 = argv2;
+   *av2++ = tee;
+   if (*hdr_arg) {
+   /* hdr_arg being nonempty means we already read the
+* pack header from demux, so we need to drop a pack
+* header in place for tee to append to, otherwise
+* we'll end up with a broken pack on disk.
+*/
+   int fp;
+   struct sha1file *s;
+   fp = open(savepath, O_CREAT | O_TRUNC | O_WRONLY, 0666);
+   s = sha1fd_throughput(fp, savepath, NULL);
+   sha1write(s, header, sizeof(header));
+   sha1flush(s);
+   close(fp);
+   /* -a is supported by both GNU and BSD tee */
+   *av2++ = -a;
+   }
+   *av2++ = savepath;
+   *av2++ = NULL;
+   cmd2.in = demux.out;
+   e = pipe(pipefds);
+   if (e != 0)
+   die(couldn't make pipe to save pack);
+   cmd2.out = pipefds[1];
+   cmd.in = pipefds[0];
+   if (start_command(cmd2))
+   die(couldn't start tee to save a pack);
+   } else
+   cmd.in = demux.out;
cmd.git_cmd = 1;
if (start_command(cmd))
die(fetch-pack: unable to fork off %s, cmd_name);
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 58207d8..bf4640d 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -82,11 +82,23 @@ test_expect_success 'fetch changes via http' '
test_cmp file clone/file
 '
 
+test_expect_success 'fetch changes via http and save pack' '
+   echo content file 
+   git commit -a -m two 
+   git push public 
+   GIT_SAVE_FETCHED_PACK_TO=saved.pack 
+   export GIT_SAVE_FETCHED_PACK_TO 
+   (cd clone  git pull) 
+   git