Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-11 Thread Jeff King
On Mon, Jun 11, 2018 at 05:18:13AM -0400, Jeff King wrote:

> > >> +sleep 1; # is interrupted by SIGCHLD
> > >> +if (!$exited) {
> > >> +close($out);
> > >> +die "Command did not exit after reading whole body";
> > >> +}
> > 
> > > Also, do we need to protect ourselves against other signals being
> > > delivered? E.g., if I resize my xterm and this process gets SIGWINCH, is
> > > it going to erroneously end the sleep and say "nope, no exited signal"?
> > 
> > I'll check, but what could I do? Should I add blocking other
> > signals there?
> 
> I think a more robust check may be to waitpid() on the child for up to N
> seconds. Something like this:
> 
>   $SIG{ALRM} = sub {
> kill(9, $pid);
> die "command did not exit after reading whole body"
>   };
>   alarm(60);
>   waitpid($pid, 0);
>   alarm(0);
> 
> That should exit immediately if $pid does, and otherwise die after
> exactly 60 seconds. Perl's waitpid implementation will restart
> automatically if it gets another signal.

I tried your original, delivering some signals to it. I think it
actually is OK, too, because perl's sleep() implementation will also
restart for something like SIGWINCH.

E.g., stracing looks like this:

  nanosleep({tv_sec=60, tv_nsec=0}, {tv_sec=57, tv_nsec=791891377}) = ? 
ERESTART_RESTARTBLOCK (Interrupted by signal)
  --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
  restart_syscall(<... resuming interrupted nanosleep ...>

-Peff


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-11 Thread Jeff King
On Tue, Jun 05, 2018 at 01:18:08AM +0300, Max Kirillov wrote:

> > On Sun, Jun 03, 2018 at 12:27:49AM +0300, Max Kirillov wrote:
> > Since this is slightly less efficient, and because it only matters if
> > the web server does not already close the pipe, should this have a
> > run-time configuration knob, even if it defaults to
> > safe-but-slightly-slower?
> 
> Personally, I of course don't want this. Also, I don't think
> the difference is much noticeable. But you can never be sure
> without trying. I'll try to measure some numbers.

I don't know if it will matter or not. I just wonder if we want to leave
an escape hatch for people who might. I could take or leave it.

> Actually, it is already 3rd same error in this file. Maybe
> deserve some refactoring. I will change the message also.

Thanks, that kind of related cleanup is very welcome.

> > We generally prefer to have all commands, even ones we don't expect to
> > fail, inside test_expect blocks (e.g., with a "setup" description).
> 
> Will the defined variables get to the next test? I'll try to
> do as you describe.

Yes, the tests are all run as evals. So as long as you don't open a
subshell yourself, any changes you make to process state will persist.

> >> +test_expect_success 'fetch plain truncated' '
> >> +  test_http_env upload \
> >> +  "$TEST_DIRECTORY"/t5562/invoke-with-content-length.pl 
> >> fetch_body.trunc git http-backend >act.out 2>act.err &&
> >> +  test_must_fail verify_http_result "200 OK"
> >> +'
> > 
> > Usually test_must_fail on a checking function like this is a sign that
> > the check is not as robust as we'd like. If the function checks two
> > things "A && B", then checking test_must_fail will only let us know
> > "!A || !B", but you probably want to check both.
> 
> Well here I just want to know that the request has failed,
> and we already know that it can fail in different ways,
> but the test is not going to differentiate those ways.

OK, looking over your verify_http_result function, I _think_ we are OK
here, because the only && is against a printf, which we wouldn't really
expect to fail.

> >> +sleep 1; # is interrupted by SIGCHLD
> >> +if (!$exited) {
> >> +close($out);
> >> +die "Command did not exit after reading whole body";
> >> +}
> 
> > Also, do we need to protect ourselves against other signals being
> > delivered? E.g., if I resize my xterm and this process gets SIGWINCH, is
> > it going to erroneously end the sleep and say "nope, no exited signal"?
> 
> I'll check, but what could I do? Should I add blocking other
> signals there?

I think a more robust check may be to waitpid() on the child for up to N
seconds. Something like this:

  $SIG{ALRM} = sub {
  kill(9, $pid);
  die "command did not exit after reading whole body"
  };
  alarm(60);
  waitpid($pid, 0);
  alarm(0);

That should exit immediately if $pid does, and otherwise die after
exactly 60 seconds. Perl's waitpid implementation will restart
automatically if it gets another signal.

-Peff


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-11 Thread Jeff King
On Sun, Jun 10, 2018 at 06:07:27PM +0300, Max Kirillov wrote:

> On Mon, Jun 04, 2018 at 12:44:09AM -0400, Jeff King wrote:
> > On Sun, Jun 03, 2018 at 12:27:49AM +0300, Max Kirillov wrote:
> >> +  env \
> >> +  CONTENT_TYPE=application/x-git-upload-pack-request \
> >> +  QUERY_STRING=/repo.git/git-upload-pack \
> >> +  PATH_TRANSLATED="$PWD"/.git/git-upload-pack \
> >> +  GIT_HTTP_EXPORT_ALL=TRUE \
> >> +  REQUEST_METHOD=POST \
> >> +  CONTENT_LENGTH="$NOT_FIT_IN_SSIZE" \
> >> +  git http-backend /dev/null 2>err &&
> >> +  grep -q "fatal:.*CONTENT_LENGTH" err
> > 
> > I'm not sure if these messages should be marked for translation. If so,
> > you'd want test_i18ngrep here.
> 
> Message localization does not seem to be used in
> http-backend at all. It makes sense - server-side software
> probably does not know who is the user on the other side, if
> the message gets to the user at all. So, I think the
> message should not be translated.

OK. I think there's been talk of localizing "fatal:", but whoever does
that patch would have to deal with fallout all over the test-suite. I
don't think we need to worry about it yet.

-Peff


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-10 Thread Max Kirillov
On Mon, Jun 04, 2018 at 12:44:09AM -0400, Jeff King wrote:
> On Sun, Jun 03, 2018 at 12:27:49AM +0300, Max Kirillov wrote:
>> +env \
>> +CONTENT_TYPE=application/x-git-upload-pack-request \
>> +QUERY_STRING=/repo.git/git-upload-pack \
>> +PATH_TRANSLATED="$PWD"/.git/git-upload-pack \
>> +GIT_HTTP_EXPORT_ALL=TRUE \
>> +REQUEST_METHOD=POST \
>> +CONTENT_LENGTH="$NOT_FIT_IN_SSIZE" \
>> +git http-backend /dev/null 2>err &&
>> +grep -q "fatal:.*CONTENT_LENGTH" err
> 
> I'm not sure if these messages should be marked for translation. If so,
> you'd want test_i18ngrep here.

Message localization does not seem to be used in
http-backend at all. It makes sense - server-side software
probably does not know who is the user on the other side, if
the message gets to the user at all. So, I think the
message should not be translated.


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-10 Thread Max Kirillov
On Tue, Jun 05, 2018 at 01:18:08AM +0300, Max Kirillov wrote:
> On Mon, Jun 04, 2018 at 12:44:09AM -0400, Jeff King wrote:
>> Since this is slightly less efficient, and because it only matters if
>> the web server does not already close the pipe, should this have a
>> run-time configuration knob, even if it defaults to
>> safe-but-slightly-slower?
> 
> Personally, I of course don't want this. Also, I don't think
> the difference is much noticeable. But you can never be sure
> without trying. I'll try to measure some numbers.

It seems to be challenging to see any effect at my system.
At least not with any real operation because changing
references needs IO and index-pack needs CPU so. I'll try
it some more.

>> We should probably say something more generic like:
>> 
>>   die_errno("unable to write to '%s'");
>> 
>> or similar.
> 
> Actually, it is already 3rd same error in this file. Maybe
> deserve some refactoring. I will change the message also.

Extracted the writing and refactoring to a single function,
also fixed the message.

>>> +cat >fetch_body <>> +0032want $hash_head
>>> +0032have $hash_prev
>>> +0009done
>>> +EOF
>> 
>> This depends on the size of the hash. That's always 40 for now, but is
>> something that may change soon.
>> 
>> We already have a packetize() helper; could we use it here?

> Could you point me to it? I cannot find it.

Sorry, misread it as packetSize. Found and used.

>> Also, do we need to protect ourselves against other signals being
>> delivered? E.g., if I resize my xterm and this process gets SIGWINCH, is
>> it going to erroneously end the sleep and say "nope, no exited signal"?

> I'll check, but what could I do? Should I add blocking other
> signals there?

In my Linux I don't see the signal. Except that, there seem to
be not that many ignored signals. Anyway, I don't see what
could be done bout it.


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-04 Thread Ramsay Jones



On 04/06/18 18:06, Max Kirillov wrote:
> On Mon, Jun 04, 2018 at 01:31:58PM +0900, Junio C Hamano wrote:
>> Max Kirillov  writes:
>>> +   size_t n = xread(0, buf, chunk_length);
>>> +   if (n < 0)
>>> +   die_errno("Reading request failed");
>>
>> n that is of type size_t is unsigned and cannot be negative here.
> 

Hmm, xread() returns an ssize_t, which is a signed type ...

> Thanks, fixing it
> Do you think sanity check for n <= chunk_length (the code
> will go mad in this case) is needed? As far as I can see n
> returns straight from system's read()

ATB,
Ramsay Jones



Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-04 Thread Max Kirillov
On Mon, Jun 04, 2018 at 12:44:09AM -0400, Jeff King wrote:

Thanks for the comments, I will do the things you proposed,
or try to and get back later if there are any issues. Some
notes below.

> On Sun, Jun 03, 2018 at 12:27:49AM +0300, Max Kirillov wrote:
> Since this is slightly less efficient, and because it only matters if
> the web server does not already close the pipe, should this have a
> run-time configuration knob, even if it defaults to
> safe-but-slightly-slower?

Personally, I of course don't want this. Also, I don't think
the difference is much noticeable. But you can never be sure
without trying. I'll try to measure some numbers.

>> +if (write_in_full(out, buf, n) < 0)
>> +die_errno("%s aborted reading request", prog_name);
> 
> We don't necessarily know why the write failed. If it's EPIPE, then yes,
> the program probably did abort. But all we know is that write() failed.
> We should probably say something more generic like:
> 
>   die_errno("unable to write to '%s'");
> 
> or similar.

Actually, it is already 3rd same error in this file. Maybe
deserve some refactoring. I will change the message also.

>> +test_expect_success 'setup repository' '
>> +test_commit c0 &&
>> +test_commit c1
>> +'
>> +
>> +hash_head=$(git rev-parse HEAD)
>> +hash_prev=$(git rev-parse HEAD~1)
> 
> We generally prefer to have all commands, even ones we don't expect to
> fail, inside test_expect blocks (e.g., with a "setup" description).

Will the defined variables get to the next test? I'll try to
do as you describe.

>> +cat >fetch_body <> +0032want $hash_head
>> +0032have $hash_prev
>> +0009done
>> +EOF
> 
> This depends on the size of the hash. That's always 40 for now, but is
> something that may change soon.
> 
> We already have a packetize() helper; could we use it here?

Could you point me to it? I cannot find it.

My understanfing is that the current protocol assumes
40 symbols hash, so another hash length would be another
protocol, and since it's manually forged here it would
anyway has to be changeda.

>> +test_expect_success 'fetch plain truncated' '
>> +test_http_env upload \
>> +"$TEST_DIRECTORY"/t5562/invoke-with-content-length.pl 
>> fetch_body.trunc git http-backend >act.out 2>act.err &&
>> +test_must_fail verify_http_result "200 OK"
>> +'
> 
> Usually test_must_fail on a checking function like this is a sign that
> the check is not as robust as we'd like. If the function checks two
> things "A && B", then checking test_must_fail will only let us know
> "!A || !B", but you probably want to check both.

Well here I just want to know that the request has failed,
and we already know that it can fail in different ways,
but the test is not going to differentiate those ways.

> (We'd also generally not use test_must_fail with a non-git command, and
> just use a simple "! verify_http_result"; that would apply equally if
> gets split into two commands).

Will use ! there.

>> +sleep 1; # is interrupted by SIGCHLD
>> +if (!$exited) {
>> +close($out);
>> +die "Command did not exit after reading whole body";
>> +}

...

> Also, do we need to protect ourselves against other signals being
> delivered? E.g., if I resize my xterm and this process gets SIGWINCH, is
> it going to erroneously end the sleep and say "nope, no exited signal"?

I'll check, but what could I do? Should I add blocking other
signals there?


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-04 Thread Max Kirillov
On Mon, Jun 04, 2018 at 01:31:58PM +0900, Junio C Hamano wrote:
> Max Kirillov  writes:
>> +size_t n = xread(0, buf, chunk_length);
>> +if (n < 0)
>> +die_errno("Reading request failed");
> 
> n that is of type size_t is unsigned and cannot be negative here.

Thanks, fixing it
Do you think sanity check for n <= chunk_length (the code
will go mad in this case) is needed? As far as I can see n
returns straight from system's read()


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-03 Thread Jeff King
On Sun, Jun 03, 2018 at 12:27:49AM +0300, Max Kirillov wrote:

> Push passes to another commands, as described in
> https://public-inbox.org/git/20171129032214.gb32...@sigill.intra.peff.net/
> 
> As it gets complicated to correctly track the data length, instead transfer
> the data through parent process and cut the pipe as the specified length is
> reached. Do it only when CONTENT_LENGTH is set, otherwise pass the input
> directly to the forked commands.

I think this approach is reasonable. It's basically converting the
known-length case to a read-to-eof case for the sub-program, which
should paper over any problems of this type. And it's what we really
_want_ the web server to be doing in the first place.

Since this is slightly less efficient, and because it only matters if
the web server does not already close the pipe, should this have a
run-time configuration knob, even if it defaults to
safe-but-slightly-slower?

I admit I don't overly care that much myself (the only large-scale Git
server deployment I am personally familiar with does not use
git-http-backend at all), but it might be nice to leave an escape hatch.

There are a few things in the patch worth fixing, but overall I think it
looks like a pretty good direction. Comments inline.

> diff --git a/http-backend.c b/http-backend.c
> index 3066697a24..78a588c551 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -351,23 +351,22 @@ static ssize_t get_content_length(void)
>   return val;
>  }
>  
> -static ssize_t read_request(int fd, unsigned char **out)
> +static ssize_t read_request(int fd, unsigned char **out, ssize_t req_len)
>  {
> - ssize_t req_len = get_content_length();
> -
>   if (req_len < 0)
>   return read_request_eof(fd, out);
>   else
>   return read_request_fixed_len(fd, req_len, out);
>  }

Minor nit, but it might have been nice to build in this infrastructure
in the first patch, rather than refactoring it here. It would also make
it much more obvious that the first one is not handling some cases,
since we'd have "req_len" but not pass it to all of the code paths. ;)

> @@ -379,11 +378,18 @@ static void inflate_request(const char *prog_name, int 
> out, int buffer_input)
>   if (full_request)
>   n = 0; /* nothing left to read */
>   else
> - n = read_request(0, _request);
> + n = read_request(0, _request, req_len);
>   stream.next_in = full_request;
>   } else {
> - n = xread(0, in_buf, sizeof(in_buf));
> + ssize_t buffer_len;
> + if (req_remaining_len < 0 || req_remaining_len > 
> sizeof(in_buf))
> + buffer_len = sizeof(in_buf);
> + else
> + buffer_len = req_remaining_len;
> + n = xread(0, in_buf, buffer_len);
>   stream.next_in = in_buf;
> + if (req_remaining_len >= 0)
> + req_remaining_len -= n;
>   }

What happens here if xread() returns an error? We probably don't want to
modify req_remaining_len (it probably doesn't matter since we'd report
the errot after this, but it feels funny not to check here).

> +static void pipe_fixed_length(const char *prog_name, int out, size_t req_len)
> +{
> + unsigned char buf[8192];
> + size_t remaining_len = req_len;
> +
> + while (remaining_len > 0) {
> + size_t chunk_length = remaining_len > sizeof(buf) ? sizeof(buf) 
> : remaining_len;
> + size_t n = xread(0, buf, chunk_length);
> + if (n < 0)
> + die_errno("Reading request failed");

I was going to complain that we usually start our error messages with a
lowercase, but this program seems to be an exception. So here you've
followed the local custom, which is OK.

> + if (write_in_full(out, buf, n) < 0)
> + die_errno("%s aborted reading request", prog_name);

We don't necessarily know why the write failed. If it's EPIPE, then yes,
the program probably did abort. But all we know is that write() failed.
We should probably say something more generic like:

  die_errno("unable to write to '%s'");

or similar.

> diff --git a/t/helper/test-print-larger-than-ssize.c 
> b/t/helper/test-print-larger-than-ssize.c
> new file mode 100644
> index 00..83472a32f1
> --- /dev/null
> +++ b/t/helper/test-print-larger-than-ssize.c
> @@ -0,0 +1,11 @@
> +#include "test-tool.h"
> +#include "cache.h"
> +
> +int cmd__print_larger_than_ssize(int argc, const char **argv)
> +{
> + size_t large = ~0;
> +
> + large = ~(large & ~(large >> 1)) + 1;
> + printf("%" PRIuMAX "\n", (uintmax_t) large);
> + return 0;
> +}

I think this might be nicer as part of "git version --build-options".
Either as a byte-size as I showed in 

Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-03 Thread Junio C Hamano
Max Kirillov  writes:

> +static void pipe_fixed_length(const char *prog_name, int out, size_t req_len)
> +{
> + unsigned char buf[8192];
> + size_t remaining_len = req_len;
> +
> + while (remaining_len > 0) {
> + size_t chunk_length = remaining_len > sizeof(buf) ? sizeof(buf) 
> : remaining_len;
> + size_t n = xread(0, buf, chunk_length);
> + if (n < 0)
> + die_errno("Reading request failed");

n that is of type size_t is unsigned and cannot be negative here.


[PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-02 Thread Max Kirillov
Push passes to another commands, as described in
https://public-inbox.org/git/20171129032214.gb32...@sigill.intra.peff.net/

As it gets complicated to correctly track the data length, instead transfer
the data through parent process and cut the pipe as the specified length is
reached. Do it only when CONTENT_LENGTH is set, otherwise pass the input
directly to the forked commands.

Add tests for cases:

* CONTENT_LENGTH is set, script's stdin has more data, with all combinations
  of variations: fetch or push, plain or compressed body, correct or truncated
  input.

* CONTENT_LENGTH is specified to a value which does not fit into ssize_t.

Helped-by: Junio C Hamano 
Signed-off-by: Max Kirillov 
---
 Makefile|   1 +
 http-backend.c  |  49 ++--
 t/helper/test-print-larger-than-ssize.c |  11 ++
 t/helper/test-tool.c|   1 +
 t/helper/test-tool.h|   1 +
 t/t5560-http-backend-noserver.sh|  13 ++
 t/t5562-http-backend-content-length.sh  | 155 
 t/t5562/invoke-with-content-length.pl   |  30 +
 8 files changed, 250 insertions(+), 11 deletions(-)
 create mode 100644 t/helper/test-print-larger-than-ssize.c
 create mode 100755 t/t5562-http-backend-content-length.sh
 create mode 100755 t/t5562/invoke-with-content-length.pl

diff --git a/Makefile b/Makefile
index f181687250..93dc4bc23b 100644
--- a/Makefile
+++ b/Makefile
@@ -678,6 +678,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o
 TEST_BUILTINS_OBJS += test-mktemp.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
+TEST_BUILTINS_OBJS += test-print-larger-than-ssize.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
 TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-ref-store.o
diff --git a/http-backend.c b/http-backend.c
index 3066697a24..78a588c551 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -351,23 +351,22 @@ static ssize_t get_content_length(void)
return val;
 }
 
-static ssize_t read_request(int fd, unsigned char **out)
+static ssize_t read_request(int fd, unsigned char **out, ssize_t req_len)
 {
-   ssize_t req_len = get_content_length();
-
if (req_len < 0)
return read_request_eof(fd, out);
else
return read_request_fixed_len(fd, req_len, out);
 }
 
-static void inflate_request(const char *prog_name, int out, int buffer_input)
+static void inflate_request(const char *prog_name, int out, int buffer_input, 
ssize_t req_len)
 {
git_zstream stream;
unsigned char *full_request = NULL;
unsigned char in_buf[8192];
unsigned char out_buf[8192];
unsigned long cnt = 0;
+   ssize_t req_remaining_len = req_len;
 
memset(, 0, sizeof(stream));
git_inflate_init_gzip_only();
@@ -379,11 +378,18 @@ static void inflate_request(const char *prog_name, int 
out, int buffer_input)
if (full_request)
n = 0; /* nothing left to read */
else
-   n = read_request(0, _request);
+   n = read_request(0, _request, req_len);
stream.next_in = full_request;
} else {
-   n = xread(0, in_buf, sizeof(in_buf));
+   ssize_t buffer_len;
+   if (req_remaining_len < 0 || req_remaining_len > 
sizeof(in_buf))
+   buffer_len = sizeof(in_buf);
+   else
+   buffer_len = req_remaining_len;
+   n = xread(0, in_buf, buffer_len);
stream.next_in = in_buf;
+   if (req_remaining_len >= 0)
+   req_remaining_len -= n;
}
 
if (n <= 0)
@@ -416,10 +422,10 @@ static void inflate_request(const char *prog_name, int 
out, int buffer_input)
free(full_request);
 }
 
-static void copy_request(const char *prog_name, int out)
+static void copy_request(const char *prog_name, int out, ssize_t req_len)
 {
unsigned char *buf;
-   ssize_t n = read_request(0, );
+   ssize_t n = read_request(0, , req_len);
if (n < 0)
die_errno("error reading request body");
if (write_in_full(out, buf, n) < 0)
@@ -428,6 +434,24 @@ static void copy_request(const char *prog_name, int out)
free(buf);
 }
 
+static void pipe_fixed_length(const char *prog_name, int out, size_t req_len)
+{
+   unsigned char buf[8192];
+   size_t remaining_len = req_len;
+
+   while (remaining_len > 0) {
+   size_t chunk_length = remaining_len > sizeof(buf) ? sizeof(buf) 
: remaining_len;
+   size_t n = xread(0, buf, chunk_length);
+   if (n < 0)
+   die_errno("Reading request failed");
+   if (write_in_full(out, buf, n)