Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Jeff King
On Thu, Sep 17, 2015 at 10:54:39AM -0700, Junio C Hamano wrote:

> OK.  Trying to repurpose strbuf_read() for non-blocking FD was a
> silly idea to begin with, as it wants to read thru to the EOF, and
> setting FD explicitly to O_NONBLOCK is a sign that the caller wants
> to grab as much as possible and does not want to wait for the EOF.
> 
> So assuming we want strbuf_read_nonblock(), what interface do we
> want from it?  We could:
> 
>  * Have it grab as much as possible into sb as long as it does not
>block?
> 
>  * Have it grab reasonably large amount into sb, and not blocking is
>an absolute requirement, but the function is not required to read
>everything that is available on the FD (i.e. the caller is
>expected to loop)?

I think we are crossing emails, but I would definitely argue for the
latter.

> If we choose the latter, then your "EAGAIN? EOF?" becomes easier,
> no?  We only have to do a single call to xread(), and then we:
> 
>  - get EAGAIN or EWOULDBLOCK; leave sb as-is, set errno==EAGAIN and
>return -1.
> 
>  - get something (in which case that is not an EOF yet); append to
>sb, return the number of bytes.
> 
>  - get EOF; leave sb as-is, return 0.

Yes, exactly.

> ssize_t strbuf_read_nonblock(struct strbuf *sb, int fd, size_t hint)
> {
>   strbuf_grow(sb, hint ? hint : 8192);
>   ssize_t want = sb->alloc - sb->len - 1;
>   ssize_t got = xread_nonblock(fd, sb->buf + sb->len, want);
>   if (got < 0) {
>   if (errno == EWOULDBLOCK)
>   errno = EAGAIN; /* make life easier for the caller */
>   return got;
>   }

I like the normalizing of errno, but that should probably be part of
xread_nonblock(), I would think.

>   sb->len += got;
>   sb->buf[sb->len] = '\0';

Use strbuf_setlen() here?

>   return got;

If "got == 0", we naturally do not change sb->len at all, and the strbuf
is left unchanged. But do we want to de-allocate what we allocated in
strbuf_grow() above? That is what strbuf_read() does, but I think it is
even less likely to help anybody here.  With the original strbuf_read(),
you might do:

  if (strbuf_read(&buf, fd, hint) <= 0)
return; /* got nothing */

but because the nature of strbuf_read_nonblock() is to call it from a
loop, you'd want to strbuf_release() when you leave the loop anyway.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Junio C Hamano
Jeff King  writes:

> I actually wonder if callers who are _expecting_ non-blocking want to
> loop in strbuf_read() at all.
>
> strbuf_read() is really about reading to EOF, and growing the buffer to
> fit all of the input. But that's not at all what you want to do with
> non-blocking. There you believe for some reason that data may be
> available (probably due to poll), and you want to read one chunk of it,
> maybe act, and then go back to polling.

I think I could have gone to grab a cup of coffee instead of typing
another message that essentially said the same thing ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Junio C Hamano
Jeff King  writes:

> I think they have to loop for correctness, but they may do this:
>
>   if (xread(fd, buf, len) < 0)
>   die_errno("OMG, an error!");
>
> which is not correct if "fd" is unknowingly non-blocking.
> ...
> The spinning behavior is not great, but does mean that we spin and
> continue rather than bailing with an error.

OK, that is a problem (I just read through "git grep xread -- \*.c".
There aren't that many codepaths that read from a fd that we didn't
open ourselves, but there indeed are some.

So it seems that we would want a xread() that spins to help such
codepaths, and xread_nonblock() that gives a short-read upon
EWOULDBLOCK.  They can share a helper function, of course.

> If we reset errno to "0" at the top of the function, we could get around
> one problem, but it still makes an annoying interface: the caller has to
> check errno for any 0-return to figure out if it was really EOF, or just
> EAGAIN.
>
> If we return -1, though, we have a similar annoyance. If the caller
> notices a -1 return value and finds EAGAIN, they still may need to check
> sb->len to see if they made forward progress and have data they should
> be dealing with.

OK.  Trying to repurpose strbuf_read() for non-blocking FD was a
silly idea to begin with, as it wants to read thru to the EOF, and
setting FD explicitly to O_NONBLOCK is a sign that the caller wants
to grab as much as possible and does not want to wait for the EOF.

So assuming we want strbuf_read_nonblock(), what interface do we
want from it?  We could:

 * Have it grab as much as possible into sb as long as it does not
   block?

 * Have it grab reasonably large amount into sb, and not blocking is
   an absolute requirement, but the function is not required to read
   everything that is available on the FD (i.e. the caller is
   expected to loop)?

If we choose the latter, then your "EAGAIN? EOF?" becomes easier,
no?  We only have to do a single call to xread(), and then we:

 - get EAGAIN or EWOULDBLOCK; leave sb as-is, set errno==EAGAIN and
   return -1.

 - get something (in which case that is not an EOF yet); append to
   sb, return the number of bytes.

 - get EOF; leave sb as-is, return 0.


ssize_t strbuf_read_nonblock(struct strbuf *sb, int fd, size_t hint)
{
strbuf_grow(sb, hint ? hint : 8192);
ssize_t want = sb->alloc - sb->len - 1;
ssize_t got = xread_nonblock(fd, sb->buf + sb->len, want);
if (got < 0) {
if (errno == EWOULDBLOCK)
errno = EAGAIN; /* make life easier for the caller */
return got;
}
sb->len += got;
sb->buf[sb->len] = '\0';
return got;
}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Stefan Beller
On Thu, Sep 17, 2015 at 10:50 AM, Jeff King  wrote:
> On Thu, Sep 17, 2015 at 10:45:40AM -0700, Stefan Beller wrote:
>
>> > You _can_ loop on read until you hit EAGAIN, but I think in general you
>> > shouldn't; if you get a lot of input on this fd, you'll starve all of
>> > the other descriptors you're polling.  You're better off to read a
>> > finite amount from each descriptor, and then check again who is ready to
>> > read.
>>
>> That's what I do with the current implementation. Except it's not as clear 
>> and
>> concise as I patched it into the strbuf_read.
>
> Is it? I thought the implementation you posted bumped the existing
> strbuf_read to strbuf_buf_internal, including the loop. So as long as we
> are not getting EAGAIN, it will keep reading forever.

You'll get EAGAIN pretty fast though, as all you do is reading as fast
as you can.

> Actually not quite
> true, as any read shorter than 8192 bytes will cause us to jump out of
> the loop, too, but if we assume that the caller is feeding us data
> faster than we can read it, we'll never exit strbuf_read_nonblock() and
> serve any of the other descriptors.

I see the difference now. That makes sense.

>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Jeff King
On Thu, Sep 17, 2015 at 10:45:40AM -0700, Stefan Beller wrote:

> > You _can_ loop on read until you hit EAGAIN, but I think in general you
> > shouldn't; if you get a lot of input on this fd, you'll starve all of
> > the other descriptors you're polling.  You're better off to read a
> > finite amount from each descriptor, and then check again who is ready to
> > read.
> 
> That's what I do with the current implementation. Except it's not as clear and
> concise as I patched it into the strbuf_read.

Is it? I thought the implementation you posted bumped the existing
strbuf_read to strbuf_buf_internal, including the loop. So as long as we
are not getting EAGAIN, it will keep reading forever. Actually not quite
true, as any read shorter than 8192 bytes will cause us to jump out of
the loop, too, but if we assume that the caller is feeding us data
faster than we can read it, we'll never exit strbuf_read_nonblock() and
serve any of the other descriptors.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Stefan Beller
On Thu, Sep 17, 2015 at 10:35 AM, Jeff King  wrote:
>

>
> You _can_ loop on read until you hit EAGAIN, but I think in general you
> shouldn't; if you get a lot of input on this fd, you'll starve all of
> the other descriptors you're polling.  You're better off to read a
> finite amount from each descriptor, and then check again who is ready to
> read.

That's what I do with the current implementation. Except it's not as clear and
concise as I patched it into the strbuf_read.

>
> And then the return value becomes a no-brainer, because it's just the
> return value of read(). Either you got some data, you got EOF, or you
> get an error, which might be EAGAIN. You never have the case where you
> got some data, but then you also got EOF and EAGAIN, and the caller has
> to figure out which.
>
> So I think you really want something like:
>
>   ssize_t strbuf_read_once(struct strbuf *sb, int fd, size_t hint)
>   {
> ssize_t cnt;
>
> strbuf_grow(hint ? hint : 8192);
> cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
> if (cnt > 0)
> strbuf_setlen(sb->len + cnt);
> return cnt;
>   }
>
> (where I'm assuming xread passes us back EAGAIN; we could also replace
> it with read and loop on EINTR ourselves).

Yeah that's exactly what I am looking for (the hint may even be over
engineered here, as I have no clue how much data comes back).

So I guess I could just use that new method now.


> I actually wonder if callers who are _expecting_ non-blocking want to
> loop in strbuf_read() at all.
>
> strbuf_read() is really about reading to EOF, and growing the buffer to
> fit all of the input. But that's not at all what you want to do with
> non-blocking. There you believe for some reason that data may be
> available (probably due to poll), and you want to read one chunk of it,
> maybe act, and then go back to polling.

As a micro project (leftover bit maybe?):
When strbuf_read is reading data out from a non blocking pipe, we're currently
spinning (with the EAGAIN/EWOULDBLOCK). Introduce a call to poll
to reduce the spinning.

>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Jeff King
On Thu, Sep 17, 2015 at 10:26:19AM -0700, Stefan Beller wrote:

> > If we return -1, though, we have a similar annoyance. If the caller
> > notices a -1 return value and finds EAGAIN, they still may need to check
> > sb->len to see if they made forward progress and have data they should
> > be dealing with.
> 
> If errno == EAGAIN, we know it is a non blocking fd, so we could encode
> the length read as (- 2 - length), such that
> 
> ...-2 the length read if EAGAIN was ignored
> -1 for error, check errno!
> 0 for EOF
> +1... length if we just read it or restarted it due to EINTR.
> 
> The call site should know if it is non blocking (i.e. if the <-2 case can
> happen) and handle it appropriately.

I actually wonder if callers who are _expecting_ non-blocking want to
loop in strbuf_read() at all.

strbuf_read() is really about reading to EOF, and growing the buffer to
fit all of the input. But that's not at all what you want to do with
non-blocking. There you believe for some reason that data may be
available (probably due to poll), and you want to read one chunk of it,
maybe act, and then go back to polling.

You _can_ loop on read until you hit EAGAIN, but I think in general you
shouldn't; if you get a lot of input on this fd, you'll starve all of
the other descriptors you're polling.  You're better off to read a
finite amount from each descriptor, and then check again who is ready to
read.

And then the return value becomes a no-brainer, because it's just the
return value of read(). Either you got some data, you got EOF, or you
get an error, which might be EAGAIN. You never have the case where you
got some data, but then you also got EOF and EAGAIN, and the caller has
to figure out which.

So I think you really want something like:

  ssize_t strbuf_read_once(struct strbuf *sb, int fd, size_t hint)
  {
ssize_t cnt;

strbuf_grow(hint ? hint : 8192);
cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
if (cnt > 0)
strbuf_setlen(sb->len + cnt);
return cnt;
  }

(where I'm assuming xread passes us back EAGAIN; we could also replace
it with read and loop on EINTR ourselves).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Stefan Beller
On Thu, Sep 17, 2015 at 10:13 AM, Jeff King  wrote:
> On Thu, Sep 17, 2015 at 09:58:00AM -0700, Junio C Hamano wrote:
>
>> > Certainly anybody who does not realize their descriptor is O_NONBLOCK
>> > and is using the spinning for correctness. I tend to think that such
>> > sites are wrong, though, and would benefit from us realizing they are
>> > spinning.
>>
>> With or without O_NONBLOCK, not looping around xread() _and_ relying
>> on the spinning for "correctness" means the caller is not getting
>> correctness anyway, I think, because xread() does return a short
>> read, and we deliberately and explicitly do so since 0b6806b9
>> (xread, xwrite: limit size of IO to 8MB, 2013-08-20).
>
> I think they have to loop for correctness, but they may do this:
>
>   if (xread(fd, buf, len) < 0)
> die_errno("OMG, an error!");
>
> which is not correct if "fd" is unknowingly non-blocking. As Stefan
> mentioned, we do not set O_NONBLOCK ourselves very much, but I wonder if
> we could inherit it from the environment in some cases.
>
> The spinning behavior is not great, but does mean that we spin and
> continue rather than bailing with an error.
>
>> > But I think you can't quite get away with leaving strbuf_read untouched
>> > in this case. On error, it wants to restore the original value of the
>> > strbuf before the strbuf_read call. Which means that we throw away
>> > anything read into the strbuf before we get EAGAIN, and the caller never
>> > gets to see it.
>>
>> I agree we need to teach strbuf_read() that xread() is now nicer on
>> O_NONBLOCK; perhaps like this?
>>
>>  strbuf.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/strbuf.c b/strbuf.c
>> index cce5eed..49104d7 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -368,6 +368,8 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t 
>> hint)
>>
>>   cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
>>   if (cnt < 0) {
>> + if (errno == EAGAIN || errno == EWOULDBLOCK)
>> + break;
>>   if (oldalloc == 0)
>>   strbuf_release(sb);
>>   else
>
> If we get EAGAIN on the first read, this will return "0", and I think we
> end up in the "was it EOF, or EAGAIN?" situation I mentioned earlier.
> If we reset errno to "0" at the top of the function, we could get around
> one problem, but it still makes an annoying interface: the caller has to
> check errno for any 0-return to figure out if it was really EOF, or just
> EAGAIN.
>
> If we return -1, though, we have a similar annoyance. If the caller
> notices a -1 return value and finds EAGAIN, they still may need to check
> sb->len to see if they made forward progress and have data they should
> be dealing with.

If errno == EAGAIN, we know it is a non blocking fd, so we could encode
the length read as (- 2 - length), such that

...-2 the length read if EAGAIN was ignored
-1 for error, check errno!
0 for EOF
+1... length if we just read it or restarted it due to EINTR.

The call site should know if it is non blocking (i.e. if the <-2 case can
happen) and handle it appropriately.

Any callsite of today which is unaware of the fd being non blocking would break
by this (as we want to fix the spinning anyway eventually)




>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Stefan Beller
On Thu, Sep 17, 2015 at 9:58 AM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> On Thu, Sep 17, 2015 at 09:13:40AM -0700, Junio C Hamano wrote:
>>
>>> And your new caller that does O_NONBLOCK wants to do more than
>>> looping upon EWOULDBLOCK.  It certainly would not want us to loop
>>> here.
>>>
>>> So I wonder if you can just O_NONBLOCK the fd and use the usual
>>> strbuf_read(), i.e. without any change in this patch, and update
>>> xread() to _unconditionally_ return when read(2) says EAGAIN or
>>> EWOULDBLOCK.
>>>
>>> What would that break?
>>
>> Certainly anybody who does not realize their descriptor is O_NONBLOCK
>> and is using the spinning for correctness. I tend to think that such
>> sites are wrong, though, and would benefit from us realizing they are
>> spinning.
>
> With or without O_NONBLOCK, not looping around xread() _and_ relying
> on the spinning for "correctness" means the caller is not getting
> correctness anyway, I think, because xread() does return a short
> read, and we deliberately and explicitly do so since 0b6806b9
> (xread, xwrite: limit size of IO to 8MB, 2013-08-20).
>
>> But I think you can't quite get away with leaving strbuf_read untouched
>> in this case. On error, it wants to restore the original value of the
>> strbuf before the strbuf_read call. Which means that we throw away
>> anything read into the strbuf before we get EAGAIN, and the caller never
>> gets to see it.
>
> I agree we need to teach strbuf_read() that xread() is now nicer on
> O_NONBLOCK; perhaps like this?
>
>  strbuf.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/strbuf.c b/strbuf.c
> index cce5eed..49104d7 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -368,6 +368,8 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t 
> hint)
>
> cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
> if (cnt < 0) {
> +   if (errno == EAGAIN || errno == EWOULDBLOCK)
> +   break;

This would need xread to behave differently too. (if EAGAIN, return)

Looking at xread to answer: "Why did we have a spinning loop in case of
EAGAIN in the first place?" I ended up with 1c15afb9343b (2005-12-19,
xread/xwrite:
do not worry about EINTR at calling sites.)

There we had lots of EAGAIN checks sprinkled through the code base, so we had
non blocking IO back then?

I think I chose to not use xread, as it contradicts everything we want
in the non
blocking case. We want to ignore any operations with a recoverable error (EAGAIN
and EINTR) and keep going with the rest of the program. In the blocking case
(as it is used currently) we can just have the checks from xread moved to
strbuf_read.

> if (oldalloc == 0)
> strbuf_release(sb);
> else
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Jeff King
On Thu, Sep 17, 2015 at 09:58:00AM -0700, Junio C Hamano wrote:

> > Certainly anybody who does not realize their descriptor is O_NONBLOCK
> > and is using the spinning for correctness. I tend to think that such
> > sites are wrong, though, and would benefit from us realizing they are
> > spinning.
> 
> With or without O_NONBLOCK, not looping around xread() _and_ relying
> on the spinning for "correctness" means the caller is not getting
> correctness anyway, I think, because xread() does return a short
> read, and we deliberately and explicitly do so since 0b6806b9
> (xread, xwrite: limit size of IO to 8MB, 2013-08-20).

I think they have to loop for correctness, but they may do this:

  if (xread(fd, buf, len) < 0)
die_errno("OMG, an error!");

which is not correct if "fd" is unknowingly non-blocking. As Stefan
mentioned, we do not set O_NONBLOCK ourselves very much, but I wonder if
we could inherit it from the environment in some cases.

The spinning behavior is not great, but does mean that we spin and
continue rather than bailing with an error.

> > But I think you can't quite get away with leaving strbuf_read untouched
> > in this case. On error, it wants to restore the original value of the
> > strbuf before the strbuf_read call. Which means that we throw away
> > anything read into the strbuf before we get EAGAIN, and the caller never
> > gets to see it.
> 
> I agree we need to teach strbuf_read() that xread() is now nicer on
> O_NONBLOCK; perhaps like this?
> 
>  strbuf.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/strbuf.c b/strbuf.c
> index cce5eed..49104d7 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -368,6 +368,8 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t 
> hint)
>  
>   cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
>   if (cnt < 0) {
> + if (errno == EAGAIN || errno == EWOULDBLOCK)
> + break;
>   if (oldalloc == 0)
>   strbuf_release(sb);
>   else

If we get EAGAIN on the first read, this will return "0", and I think we
end up in the "was it EOF, or EAGAIN?" situation I mentioned earlier.
If we reset errno to "0" at the top of the function, we could get around
one problem, but it still makes an annoying interface: the caller has to
check errno for any 0-return to figure out if it was really EOF, or just
EAGAIN.

If we return -1, though, we have a similar annoyance. If the caller
notices a -1 return value and finds EAGAIN, they still may need to check
sb->len to see if they made forward progress and have data they should
be dealing with.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Jeff King
On Thu, Sep 17, 2015 at 09:44:19AM -0700, Junio C Hamano wrote:

> > Arguably we should actually return the number of bytes we _did_ read,
> > but then caller cannot easily tell the difference between EOF and
> > EAGAIN.
> 
> Why can't it check errno==EAGAIN/EWOULDBLOCK?

Is it trustworthy to check errno if read() has not actually ever
returned -1? E.g., consider this sequence:

  1. somebody (maybe even us calling strbuf_read) calls read() and it
 returns -1, setting errno to EAGAIN

  2. we call strbuf_read()

2a. it calls read(), which returns N bytes

2b. it calls read() again, which returns 0 for EOF

2c. it returns N, because that's how many bytes it read

  3. we wonder if we hit EOF, or if we simply need to read again. We
 check errno == EAGAIN

I don't think the read calls in step 2 are guaranteed to clear errno,
and we might read the cruft from step 1.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Sep 17, 2015 at 09:13:40AM -0700, Junio C Hamano wrote:
>
>> And your new caller that does O_NONBLOCK wants to do more than
>> looping upon EWOULDBLOCK.  It certainly would not want us to loop
>> here.
>> 
>> So I wonder if you can just O_NONBLOCK the fd and use the usual
>> strbuf_read(), i.e. without any change in this patch, and update
>> xread() to _unconditionally_ return when read(2) says EAGAIN or
>> EWOULDBLOCK.
>> 
>> What would that break?
>
> Certainly anybody who does not realize their descriptor is O_NONBLOCK
> and is using the spinning for correctness. I tend to think that such
> sites are wrong, though, and would benefit from us realizing they are
> spinning.

With or without O_NONBLOCK, not looping around xread() _and_ relying
on the spinning for "correctness" means the caller is not getting
correctness anyway, I think, because xread() does return a short
read, and we deliberately and explicitly do so since 0b6806b9
(xread, xwrite: limit size of IO to 8MB, 2013-08-20).

> But I think you can't quite get away with leaving strbuf_read untouched
> in this case. On error, it wants to restore the original value of the
> strbuf before the strbuf_read call. Which means that we throw away
> anything read into the strbuf before we get EAGAIN, and the caller never
> gets to see it.

I agree we need to teach strbuf_read() that xread() is now nicer on
O_NONBLOCK; perhaps like this?

 strbuf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index cce5eed..49104d7 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -368,6 +368,8 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 
cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
if (cnt < 0) {
+   if (errno == EAGAIN || errno == EWOULDBLOCK)
+   break;
if (oldalloc == 0)
strbuf_release(sb);
else
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Stefan Beller
On Thu, Sep 17, 2015 at 9:44 AM, Junio C Hamano  wrote:
> On Thu, Sep 17, 2015 at 9:30 AM, Jeff King  wrote:
>>
>> So I think we would probably want to treat EAGAIN specially: return -1
>> to signal to the caller but _don't_ truncate the strbuf.
>
> Yeah, "don't truncate" is needed.
>
>> Arguably we should actually return the number of bytes we _did_ read,
>> but then caller cannot easily tell the difference between EOF and
>> EAGAIN.
>
> Why can't it check errno==EAGAIN/EWOULDBLOCK?

Grepping through Gits sources, there are no occurrences of
explicitly setting  O_NONBLOCK except for the newly introduced
spot in the followup patch in run-command (yes we do poll there).

So how would I find out if a fd is blocking or not in the 82 cases
of strbuf_read? (Now I naively assume they would all block.)

We could also expose the flags in the API. I think IGNORE_EAGAIN
might not be the best now, but rather a NO_TRUNCATE flag would do.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Junio C Hamano
On Thu, Sep 17, 2015 at 9:30 AM, Jeff King  wrote:
>
> So I think we would probably want to treat EAGAIN specially: return -1
> to signal to the caller but _don't_ truncate the strbuf.

Yeah, "don't truncate" is needed.

> Arguably we should actually return the number of bytes we _did_ read,
> but then caller cannot easily tell the difference between EOF and
> EAGAIN.

Why can't it check errno==EAGAIN/EWOULDBLOCK?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Jeff King
On Thu, Sep 17, 2015 at 09:13:40AM -0700, Junio C Hamano wrote:

> And your new caller that does O_NONBLOCK wants to do more than
> looping upon EWOULDBLOCK.  It certainly would not want us to loop
> here.
> 
> So I wonder if you can just O_NONBLOCK the fd and use the usual
> strbuf_read(), i.e. without any change in this patch, and update
> xread() to _unconditionally_ return when read(2) says EAGAIN or
> EWOULDBLOCK.
> 
> What would that break?

Certainly anybody who does not realize their descriptor is O_NONBLOCK
and is using the spinning for correctness. I tend to think that such
sites are wrong, though, and would benefit from us realizing they are
spinning.

But I think you can't quite get away with leaving strbuf_read untouched
in this case. On error, it wants to restore the original value of the
strbuf before the strbuf_read call. Which means that we throw away
anything read into the strbuf before we get EAGAIN, and the caller never
gets to see it.

So I think we would probably want to treat EAGAIN specially: return -1
to signal to the caller but _don't_ truncate the strbuf.

Arguably we should actually return the number of bytes we _did_ read,
but then caller cannot easily tell the difference between EOF and
EAGAIN.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-17 Thread Junio C Hamano
Stefan Beller  writes:

> Subject: Re: [PATCH 01/10] strbuf: Add strbuf_read_noblock

s/Add/add/;

> We need to read from pipes without blocking in a later patch.

I am hoping that you are at least not spinning---i.e. do a poll 
first to make sure there is at least some progress to be made
before calling this.

> Signed-off-by: Stefan Beller 
> ---
>  strbuf.c | 25 +++--
>  strbuf.h |  6 ++
>  2 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/strbuf.c b/strbuf.c
> index cce5eed..4130ee2 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -357,7 +357,10 @@ size_t strbuf_fread(struct strbuf *sb, size_t size, FILE 
> *f)
>   return res;
>  }
>  
> -ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
> +#define IGNORE_EAGAIN (1)
> +
> +static ssize_t strbuf_read_internal(struct strbuf *sb, int fd,
> + size_t hint, int flags)
>  {
>   size_t oldlen = sb->len;
>   size_t oldalloc = sb->alloc;
> @@ -366,8 +369,16 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t 
> hint)
>   for (;;) {
>   ssize_t cnt;
>  
> - cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
> + cnt = read(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
>   if (cnt < 0) {
> + if (errno == EINTR)
> + continue;
> + if (errno == EAGAIN) {
> + if (flags & IGNORE_EAGAIN)
> + break;
> + else
> + continue;
> + }

In order to ensure that this is not breaking the normal case, I had
to look at the implementation of xread() to see it behaves identically
when the flags is not passed.  That one-time review burden implies
that this is adding a maintenance burden to keep this copied function
in sync.

We should also handle EWOULDBLOCK not just EAGAIN.

More importantly, I am not sure if this helper is even necessary.

Looking at xread (reproduced in its full glory):

/*
 * xread() is the same a read(), but it automatically restarts read()
 * operations with a recoverable error (EAGAIN and EINTR). xread()
 * DOES NOT GUARANTEE that "len" bytes is read even if the data is available.
 */
ssize_t xread(int fd, void *buf, size_t len)
{
ssize_t nr;
if (len > MAX_IO_SIZE)
len = MAX_IO_SIZE;
while (1) {
nr = read(fd, buf, len);
if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
continue;
return nr;
}
}

Are we doing the right thing for EAGAIN here?

Now, man read(2) says this:

   EAGAIN 
The file descriptor fd refers to a file other than a
socket and has been marked nonblocking (O_NONBLOCK),
and the read would block.

   EAGAIN or EWOULDBLOCK
The file descriptor fd refers to a socket and has
been marked nonblocking (O_NONBLOCK), and the read
would block.  POSIX.1-2001 allows either error to be
returned for this case, and does not require these
con‐ stants to have the same value, so a portable
application should check for both possibilities.

If the fd is not marked nonblocking, then we will not get EAGAIN (or
EWOULDBLOCK).

When fd is set to nonblocking, the current xread() spins if read()
says that the operation would block.  What would we achieve by
spinning ourselves here, though?  The caller must have set the fd to
be nonblocking for a reason, and that reason cannot be "if the data
is not ready, we do not have anything better than just spinning for
new data to arrive"---if so, the caller wouldn't have set the fd to
be nonblocking in the first place.

Even if there is such a stupid caller that only wants us to loop,
because we explicitly allow xread() to return a short read, all of
our callers must call it in a loop if they know how much they want
to read.  We can just return from here and let them loop around us.

And your new caller that does O_NONBLOCK wants to do more than
looping upon EWOULDBLOCK.  It certainly would not want us to loop
here.

So I wonder if you can just O_NONBLOCK the fd and use the usual
strbuf_read(), i.e. without any change in this patch, and update
xread() to _unconditionally_ return when read(2) says EAGAIN or
EWOULDBLOCK.

What would that break?

>   if (oldalloc == 0)
>   strbuf_release(sb);
>   else
> @@ -384,6 +395,16 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t 
> hint)
>   return sb->len

[PATCH 01/10] strbuf: Add strbuf_read_noblock

2015-09-16 Thread Stefan Beller
We need to read from pipes without blocking in a later patch.

Signed-off-by: Stefan Beller 
---
 strbuf.c | 25 +++--
 strbuf.h |  6 ++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index cce5eed..4130ee2 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -357,7 +357,10 @@ size_t strbuf_fread(struct strbuf *sb, size_t size, FILE 
*f)
return res;
 }
 
-ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
+#define IGNORE_EAGAIN (1)
+
+static ssize_t strbuf_read_internal(struct strbuf *sb, int fd,
+   size_t hint, int flags)
 {
size_t oldlen = sb->len;
size_t oldalloc = sb->alloc;
@@ -366,8 +369,16 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
for (;;) {
ssize_t cnt;
 
-   cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
+   cnt = read(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
if (cnt < 0) {
+   if (errno == EINTR)
+   continue;
+   if (errno == EAGAIN) {
+   if (flags & IGNORE_EAGAIN)
+   break;
+   else
+   continue;
+   }
if (oldalloc == 0)
strbuf_release(sb);
else
@@ -384,6 +395,16 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
return sb->len - oldlen;
 }
 
+ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
+{
+   return strbuf_read_internal(sb, fd, hint, 0);
+}
+
+ssize_t strbuf_read_noblock(struct strbuf *sb, int fd, size_t hint)
+{
+   return strbuf_read_internal(sb, fd, hint, IGNORE_EAGAIN);
+}
+
 #define STRBUF_MAXLINK (2*PATH_MAX)
 
 int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint)
diff --git a/strbuf.h b/strbuf.h
index aef2794..23ca7aa 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,12 @@ extern size_t strbuf_fread(struct strbuf *, size_t, FILE 
*);
 extern ssize_t strbuf_read(struct strbuf *, int fd, size_t hint);
 
 /**
+ * Same as strbuf_read, just returns non-blockingly by ignoring EAGAIN.
+ * The fd must have set O_NONBLOCK.
+ */
+extern ssize_t strbuf_read_noblock(struct strbuf *, int fd, size_t hint);
+
+/**
  * Read the contents of a file, specified by its path. The third argument
  * can be used to give a hint about the file size, to avoid reallocs.
  */
-- 
2.6.0.rc0.131.gf624c3d

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html