Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-25 Thread Jamie Lokier

Ingo Molnar wrote:
> this is what TUX uses. When a eg. static HTTP request arrives it sends
> reply headers shortly after having checked file permissions and stuff (but
> the file is not yet sent), with MSG_MORE set. Then it sends the file, and
> sendfile() keeps MSG_MORE set right until the end of the request, when it
> clears it for the last fragment so the last partial packet gets flushed to
> the network. In fact there is one more optimization here, if the request
> is not keepalive then TUX still kees MSG_MORE set, and closes the socket -
> which will implicitly flush the output queue anyway and send any partial
> packet, but will also have the FIN packet merged with the last outgoing
> packet.

... is it possible to do control the MSG_MORE flag for sendfile's final
packet from user space, or do you have to use TCP_CORK to get the control?

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-23 Thread Cacophonix

http://community.roxen.com/developers/idocs/drafts/draft-minshall-nagle-01.txt

There's also a paper on the specific issues:
http://www.cc.gatech.edu/fac/Ellen.Zegura/wisp99/papers/minshall.ps

You may also want to look up the ietf tcp-impl archive from '99 for discussions
on the draft.

cheers,
karthik


--- Guus Sliepen ([EMAIL PROTECTED]) wrote:

On Sat, Jan 20, 2001 at 10:39:36PM +0300, Alexey Kuznetsov wrote: 

> Yes. It is cost, which we have to pay. Look into Minshall's draft, 
> by the way (draft-minshall-nagle-*), it discusses pros and contras. 

What kind of draft is that? I can't find it on the IETF site. Could you 
provide me with a link? 

--- 
Met vriendelijke groet / with kind regards, 
  Guus Sliepen <[EMAIL PROTECTED]> 
--- 

__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-22 Thread dean gaudet

On 20 Jan 2001, Kai Henningsen wrote:

> [EMAIL PROTECTED] (dean gaudet)  wrote on 18.01.01 in 
><[EMAIL PROTECTED]>:
>
> > i'm pretty sure the actual use of pipelining is pretty disappointing.
> > the work i did in apache preceded the widespread use of HTTP/1.1 and we
>
> What widespread use of HTTP/1.1?
>
> I justtried the following excercise:
>
> Request a nonexistant page with HTTP/1.1 syntax.
>
> a. Directly from Apache: I get a nice chunked HTTP/1.1 answer.
> b. Via Squid: I get a plain HTTP/1.0 answer.
>
> As long as not even Squid talks 1.1, how can we expect browsers to do it?
>
> WebMUX? In a thousand years perhaps.

what's the widespread use of ECN?  or SACK when that was first put in?
what about ipchains before 2.2 was released?

why bother being the first to implement anything new, might as well wait
for the commercial folks to put it into a product and spread it wide and
far eh?

i'm pretty sure i said that it was our (the apache group's) position that
we wanted as perfect as possible of a pipelining implementation so that
should someone finally do a client-side version then there wouldn't be
apache bottlenecks in the way.  i still think that's the right attitude.
if we'd left the packet boundaries in there then there wouldn't even be
motivation to bother doing a client-side pipelining implementation,
there'd be little or no benefit.

btw, HTTP/1.1 proxying is more challenging than HTTP/1.0 proxying which is
probably why squid doesn't support it yet (nor does the apache proxy
module).

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-22 Thread Rick Jones

[EMAIL PROTECTED] wrote:
> 
> Hello!
> 
> >   is there really
> > much value in the second request flowing to the server before the first
> > byte of the reply has hit?
> 
> Yes, of course, it has lots of sense: f.e. all the icons, referenced
> parent page are batched to single well-coalesced stream without rtt delays
> between them. It is the only sense of pipelining yet.

"Elsewhere" i see references stating that the typical RTT for the great
unwashed masses is somewhere in the range of 100 to 200 milliseconds.

The linux standalone ACK timer is 200 milliseconds yes?

If the web server is going to take longer than 200 milliseconds to
generate the first byte of the reply to the first request it seems that
the bottleneck here is the web server, not the link RTT.

Now, if the server _is_ able to respond with the first bytes (ignoring
CORK for the moment) sooner than the standalone ACK timer, then perhaps
the RTT is an issue. However, as we were in the constrained case of only
two requests, I suspect that it is not a big deal.

If there are all those icons to be displayed, there would be more than
two requests. Without the explicit (cork et al)/implicit (tcpnodely)
push at the client those 2-N requests will pile-up into a nice sized TCP
segment. Those requests will arrive en-mass at the server and will then
have RTT issues ammortized.

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-21 Thread kuznet

Hello!

> So now the question is: when does this new nagle algorithm delay packets in the
> write queue? It _must_ do something, otherwise TCP_NODELAY would obviously be a
> noop.

It allows _one_ incomplete segment to fly. Minshall and BSD behave absolutely
similarly in all the curcumstances except for some exceptional situations.

Andrea, I have better question: what "classic" Nagle algorithm
does in this case? Just think and you will find the issue non-trivial. 8)8)


> This was all my wondering about uncorking not being equivalent to SIOCPUSH.

The samples are mostly equivalent.

The difference may happen depending on previous history of the connection.



> If I understood wall they would been equivalent if I did write(10*MSS)
> instead of write(10*MSS+1).

Under CORK you may do everything. Segmentation will be perfect in any case.

I spoke about default behaviour.


> IMHO latency can be fixed in a much better way using ioctl(SIOCPUSH)

No doubts.

Only all these tricks with CORKs, MOREs, PUSHes are orthogonal issue.
Nagling is used on generic connections, which do not know and do not want
to know anything about problems of tcp, and just write bytes to socket
and read bytes from it.


>   Legacy userspace

Please, rewind thread and reread Linus' mail containing mantra
to read in such hard cases. 8)

Applications not using corking or some other our home-made extensions
are not "legacy". The word is different: they are "standard"
de facto and de jure.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread James Sutherland

On Sun, 21 Jan 2001, Lincoln Dale wrote:

> hi,
>
> At 04:56 PM 20/01/2001 +0200, Kai Henningsen wrote:
> >[EMAIL PROTECTED] (dean gaudet)  wrote on 18.01.01 in
> ><[EMAIL PROTECTED]>:
> > > i'm pretty sure the actual use of pipelining is pretty disappointing.
> > > the work i did in apache preceded the widespread use of HTTP/1.1 and we
> >
> >What widespread use of HTTP/1.1?
>
> this is probably digressing significantly from linux-kernel related issues,
> but i owuld say that HTTP/1.1 usage is more widespread than your probably
> think.
>
> from the statistics of a beta site running a commercial transparent caching
> software:
>  # show statistics http requests
>Statistics - Requests
> Total   %
>  ---
>  ...
> HTTP 0.9 Requests:  41907 0.0
> HTTP 1.0 Requests:   3756320124.1
> HTTP 1.1 Requests:  11828209275.9
> HTTP Unknown Requests:  1 0.0

IIRC, the discrepancy is because some browsers (IE, not sure about
Netscape) default to speaking HTTP/1.0 to a proxy if they are configured
to use one - a chicken and egg situation, AFAICS: proxies are assumed to
be HTTP 1.0 only, so browsers only send HTTP 1.0 requests - so people look
at the logs and think 90% of their users are HTTP/1.0 only anyway, so
there's no point in supporting 1.1 yet...

Also something to do with HTTP 1.1 being much harder to support properly
in proxies due to the new cache control features etc.??

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Lincoln Dale

hi,

At 04:56 PM 20/01/2001 +0200, Kai Henningsen wrote:
[EMAIL PROTECTED] (dean
gaudet)  wrote on 18.01.01 in
<[EMAIL PROTECTED]>:
> i'm pretty sure the actual use of pipelining is pretty
disappointing.
> the work i did in apache preceded the widespread use of HTTP/1.1 and
we

What widespread use of HTTP/1.1?
this is probably digressing significantly from linux-kernel related
issues, but i owuld say that HTTP/1.1 usage is more widespread than your
probably think.

from the statistics of a beta site running a commercial transparent
caching software:
#
show statistics http requests
 
Statistics - Requests

   
 
Total   %

  
---
...
  
HTTP 0.9 Requests: 
41907 0.0
  
HTTP 1.0 Requests:   37563201    24.1
  
HTTP 1.1 Requests:  118282092    75.9
  
HTTP Unknown
Requests: 
1 0.0
...


cheers,

lincoln.


Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Guus Sliepen

On Sat, Jan 20, 2001 at 10:39:36PM +0300, Alexey Kuznetsov wrote:

> Yes. It is cost, which we have to pay. Look into Minshall's draft,
> by the way (draft-minshall-nagle-*), it discusses pros and contras.

What kind of draft is that? I can't find it on the IETF site. Could you
provide me with a link?

---
Met vriendelijke groet / with kind regards,
  Guus Sliepen <[EMAIL PROTECTED]>
---
See also: http://tinc.nl.linux.org/
  http://www.kernelbench.org/
---

 PGP signature


Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Andrea Arcangeli

On Sat, Jan 20, 2001 at 11:22:14PM +0300, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > >   write(10*MSS)
> > >   write(1)
> > >   write(1)
> ...
> > As far as I can tell, the second "write(1)" will always merge with the
> > first one
> 
> This would be true, if Andrea wrote not exactly 10*MSS,
> but 10*MSS+1 or just write().

So then this:

val = 1
setsockopt(TCP_CORK, &val) /* cork */

write(10*MSS+1)
write(1)

val = 0
setsockopt(TCP_CORK, &val) /* uncork */

is different from this:

val = 1
setsockopt(TCP_CORK, &val) /* cork */

write(10*MSS+1)
write(1)

val = 0
setsockopt(TCP_CORK, &val) /* uncork */

val = 1
setsockopt(TCP_NODELAY, &val)
val = 0
setsockopt(TCP_NODELAY, &val)

This was all my wondering about uncorking not being equivalent to SIOCPUSH.
(the two setsockopt(TCP_NODELAY) can be replaced with a SIOCPUSH of course)

If I understood wall they would been equivalent if I did write(10*MSS)
instead of write(10*MSS+1).

(and btw, the TCP_NODELAY being cleared immediatly means that if the last
packet can't be sent during the setsockopt it will be delayed with nagle as
usual when we get the acknowledgments from the receiver, that's the same
mistake of my SIOCPUSH first implementation that infact should be sligtly
improved in the way descriped a few emails ago adding a tp->push field and
then it will become different than going for a moment in TCP_NODELAY mode)

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Andrea Arcangeli

On Sat, Jan 20, 2001 at 10:39:36PM +0300, [EMAIL PROTECTED] wrote:
> Much saner behaviour wrt latency (and perfect clarity) overweights

IMHO latency can be fixed in a much better way using ioctl(SIOCPUSH) after the
last write() plus we could also add a MSG_NOMORE to set in the last send().
MSG_NOMORE would be not anonymous-capable but it would have zero syscall cost
compared to SIOCPUSH.

This would also address when the stack still delays something by mistake, and
yes it must still delay something sometime (even if not in my example)
otherwise setsockopt(TCP_NODELAY) is a noop. Whatever heuristic you use it
can't get right all the cases because it misses information that it can't
recover by guessing the right thing to do.  Legacy userspace will run still
fine, optimized software will take advantage of the new capability of the
stack.

In short instead of decreasing merging capabilities of the stack IMHO it's
better to give the information from userspace to the stack to let it know when
it must stop to do merging. This way the stack will always do the right thing.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Andrea Arcangeli

On Sat, Jan 20, 2001 at 11:39:30AM -0800, Linus Torvalds wrote:
> As far as I can tell, the second "write(1)" will always merge with the
> first one - unless the first one has already been sent out, [..]

Here the question is only if the first write(1) will be still there when we do the
second write(1).

If the first write(1) is still there it will of course merge fine with the second
write(1) as usual.

With 2.4 we'll send out the first write(1) immediatly on the wire if cwnd and
receiver window allows that without caring that there are packets out. While
the classcal nagle (the only one I known about) would wait for the second
write(1) to arrive until all the previously sent data is been acknowledged from
the receiver (to give to the second write(1) the time to be merged with the
first write(1)).

It's not obvious this new algorithm a good thing to me but I won't argue if
this is the new standard algorithm.

So now the question is: when does this new nagle algorithm delay packets in the
write queue? It _must_ do something, otherwise TCP_NODELAY would obviously be a
noop.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread kuznet

Hello!

> > write(10*MSS)
> > write(1)
> > write(1)
...
> As far as I can tell, the second "write(1)" will always merge with the
> first one

This would be true, if Andrea wrote not exactly 10*MSS,
but 10*MSS+1 or just write().

In some exceptional situations (sort of writing exactly N*MSS,
then remnant, then something) Minshall's and bsd coalescing
are a bit different.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread kuznet

Hello!

> So this mean if I do:

Yes. It is cost, which we have to pay. Look into Minshall's draft,
by the way (draft-minshall-nagle-*), it discusses pros and contras.

Much saner behaviour wrt latency (and perfect clarity) overweights
a bit worse coalescing.

Alexey

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Kai Henningsen

[EMAIL PROTECTED] (dean gaudet)  wrote on 18.01.01 in 
<[EMAIL PROTECTED]>:

> i'm pretty sure the actual use of pipelining is pretty disappointing.
> the work i did in apache preceded the widespread use of HTTP/1.1 and we

What widespread use of HTTP/1.1?

I justtried the following excercise:

Request a nonexistant page with HTTP/1.1 syntax.

a. Directly from Apache: I get a nice chunked HTTP/1.1 answer.
b. Via Squid: I get a plain HTTP/1.0 answer.

As long as not even Squid talks 1.1, how can we expect browsers to do it?

WebMUX? In a thousand years perhaps.

MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Linus Torvalds



On Sat, 20 Jan 2001, Andrea Arcangeli wrote:

> On Sat, Jan 20, 2001 at 10:05:45PM +0300, [EMAIL PROTECTED] wrote:
> > It makes. One small packet is allowed to fly, not depending on packets_out.
> 
> So this mean if I do:
> 
>   write(10*MSS)
>   write(1)
>   write(1)
> 
> 2.4 can send 10 packet with MSS large payload plus two packets with a
> payload of 1 byte even if during the two write(1) the previous packets were
> still out (not acknowledged yet). Classical nagle would send 10 packet with
> MSS large payload plus 1 packet with a two bytes payload in the same
> scenario.

As far as I can tell, the second "write(1)" will always merge with the
first one - unless the first one has already been sent out, of course (in
which classical nagle would have done the same thing).

So I think we'd do TheRightThing(tm) regardless. But maybe I misread.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Andrea Arcangeli

On Sat, Jan 20, 2001 at 10:05:45PM +0300, [EMAIL PROTECTED] wrote:
> It makes. One small packet is allowed to fly, not depending on packets_out.

So this mean if I do:

write(10*MSS)
write(1)
write(1)

2.4 can send 10 packet with MSS large payload plus two packets with a
payload of 1 byte even if during the two write(1) the previous packets were
still out (not acknowledged yet). Classical nagle would send 10 packet with
MSS large payload plus 1 packet with a two bytes payload in the same
scenario.

Andre 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread kuznet

Hello!

> semantics of snd_sml), maybe it makes the difference but then I don't see how.

It makes. One small packet is allowed to fly, not depending on packets_out.
This is idea of Minshall.

"Classic" Nagle also does not prohibit this, but it is difficult
to formulate it in terms of presegmented queue, used in linux,
so that we preferred Minshall's approach.

Third algo, used by linux-2.2: "queue as soon as packet is small
and packets_out!=0" is _not_ Nagle's one, it is surely wrong.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Andrea Arcangeli

On Sat, Jan 20, 2001 at 08:28:04PM +0300, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > My argument applies to 2.4. The uncork _won't_ push on the wire the last
> > not mss-sized fragment until it's the last one in the write queue even once
> > cwnd and receiver window allows that. I think 
> 
> Look at the code again. You misread it.

This is possible indeed, but in a few minutes re-read I still understand the
same thing as yesterday: if there's only 1 packet in the write queue pointed by
the send_head the nonagle will remains zero and the tcp_nagle_check will get
nonagle zero as well and it will compute the nagle default algorithm, this last
packet will be delayed normally in function of the nagle algortith.  I can
think of a:
 
CORK
write(1)
UNCORK
 
and the 1 byte of payload will stay there until there will be nothing in flight
and all previously sent data will be acknowledged from the receiver. It seems
even more clear that we can delay the push in the wire of the last packet with
the uncorking if the receiver window or the congestion window forbid us to xmit
such packet immediatly in the uncork call. I admit I didn't looked at all into
the tcp_minshall_check due lack of time (that in turn means I don't know the
semantics of snd_sml), maybe it makes the difference but then I don't see how.
The rest seems quite obvious, packets_out indicates that we sent packets and
that those packets aren't acknowledged yet.  I would have written a testcase to
try in practice what happens but I'm busy, I apologise for bothering if I'm
totally wrong.  However if you have the time you would make me a favour if you
could tell _where_ my understanding of the code is wrong and what exactly I'm
missing.  Thx!

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread kuznet

Hello!

>   is there really
> much value in the second request flowing to the server before the first
> byte of the reply has hit?

Yes, of course, it has lots of sense: f.e. all the icons, referenced
parent page are batched to single well-coalesced stream without rtt delays
between them. It is the only sense of pipelining yet.

Otherwise, you get http/1.0 with keepalives.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread Abramo Bagnara

[EMAIL PROTECTED] wrote:
> 
> 
> > The manpage disagrees with you:
> 
> Do you jest?
> 
> This manpages is wrong, or, to be more exact, is incomplete,
> which is common flaw of them.
> 
> get/set mean nothing but read-only/changing, i.e.
> another important property missing in ioctl interface.

setsockopt i.e. set socket options

I think that Andrea's point is that SIOCPUSH don't set anything (i.e.
don't change a state), but ask for an action to be done now.

For this reason the name setsockopt is counter intuitive (apart from man
page) and it seems to violate the principle of least surprise.

I think this point of view is easily agreeable.

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project ishttp://www.alsa-project.org
sponsored by SuSE Linuxhttp://www.suse.com

It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-20 Thread kuznet

Hello!

> My argument applies to 2.4. The uncork _won't_ push on the wire the last
> not mss-sized fragment until it's the last one in the write queue even once
> cwnd and receiver window allows that. I think 

Look at the code again. You misread it.


> wouldn't be setting nonalge unconditionally to 1 in

You could guess that "1" in variable with name "nodelay" means nodelay. 8)

The checks for tcp_skb_is_last() have nothing to do with nagling.
Look into comments, they explain all the whys and whos.


> The manpage disagrees with you:

Do you jest?

This manpages is wrong, or, to be more exact, is incomplete,
which is common flaw of them.

get/set mean nothing but read-only/changing, i.e.
another important property missing in ioctl interface.

Andrea, how long do you use manpages when designing? 8)


> The BKL will be moved down in the next months

Excellent.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread Rick Jones


> Look: http-1.1, asynchronous one, the first request is sent, but not acked.
> Time to send the second one, but it is blocked by Nagle. If there is no
> third request, the pipe stalls. Seems, this situation will be usual,
> when http-1.1 will start to be used by clients, because of dependencies
> between replys (references) frequently move it to http-1.0 synchronous
> mode, but with some data in flight. See?

The stall takes place if and only if the web server takes longer than
the standalone ACK timer to generate the first bytes of reply. Once the
first bytes of the reply hit the client, the client's second request
will flow.

If the web server takes longer than the standalone ACK timer to generate
the first bytes of the reply, there is no particular value in the second
request having arrived anyway - it will simply sit queued in the
server's stack rather than the client's stack. You could argue that the
server could start serving the second request, but it still has to hold
the reply and keep it queued until the first reply is complete, and I
suspect there is little value in working for that much parallelism here.
Better to have as much queuing in your most distributed resource - the
clients.

Further, even ignoring the issue of standalone acks, is there really
much value in the second request flowing to the server before the first
byte of the reply has hit? I would think that the parallelism in the
server is going to be among all the different sources of request, not
fromwithin a given source of requests. 

Also, if the browser is indeed going to do pipelined requests, and
getting the requests to the server as very quickly as possible was
indeed required because the requests could be started in parallel (just
how likely that is I have no idea) i would have thought that it would
(could) want go through the page, gather-up all the URLs from the given
server, and then dump all those requests into the connection at once
(modulo various folks dislike of sendmsg and writev  :). We are in this
instance at least talking about purpose coded software anyhow and not a
random CGI dribbler. In that sense, the "logically associated data" are
all the server's URL's from that page. Yes, this paragraph is in slight
contradiction with my statement above about keeping things queued in the
client :)

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread Andrea Arcangeli

On Fri, Jan 19, 2001 at 09:18:04PM +0300, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > The "uncork" won't push the last skb on the wire if there is not acknowledged
> > data in the write_queue and the payload of the last skb in the write_queue
> > isn't large MSS. This because the `uncork' will only re-evaluate the
> > write_queue in function of the _nagle_ algorithm, quite correctly because the
> > "uncork" will move frok "cork" to "nagle" (not from "cork" to "nodelay").
> 
> At least for your own implementation of SIOCPUSH has "1" among
> arguments of push_pending_frames, so that this does not happen. 8)8)

I wasn't talking about SIOCPUSH but only about uncorking, I carefully made sure
that SIOCPUSH was interpreting the write_queue from a "nodelay" point of view
(but I didn't noticed the uncorking wasn't doing that ;).

> The second thing, which makes argument above wrong is that
> both classic Nagle and linux-2.4 Nagle (extended by Minshall),
> do not have this problem. Your argument applies to buggy flavor
> of nagling specific to 2.2.

My argument applies to 2.4. The uncork _won't_ push on the wire the last
not mss-sized fragment until it's the last one in the write queue even once
cwnd and receiver window allows that. I think if there would be no problem you
wouldn't be setting nonalge unconditionally to 1 in
2.4/include/net/tcp.h.__tcp_push_pending_frames:

if (!tcp_skb_is_last(sk, skb))
nonagle = 1;
if (!tcp_snd_test(tp, skb, cur_mss, nonagle) ||
tcp_write_xmit(sk))
tcp_check_probe_timer(sk, tp);

> However, SIOCPUSH really affects latency badly in some curcumstances.

Not using SIOCPUSH can only increase latency, and using it can only decrease
latency. Obviously because it allows the last little fragment to be sent before
waiting it to be the last one in the write queue so reducing the time the last
fragment is received from the other end and so in turn reducing the latency of
the reply from the other end. There are no other differences in the behaviour.

It tells to the stack one information that the stack can't know otherwise and
that only the programmer writing the application knows. Knowing that
information can only allow the stack to do a _better_ choice.

What my SIOCPUSH implementation was missing is to keep considering the
write_queue in tp->nonagle=1 mode until somebody writes to the socket (so that
as soon as one acknowledgemnt for the previous data increases the reciver
window and our send window increases as well we can send the last fragment
immediatly without waiting it to be the last one). The first write to the
socket will clear tp->push.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread Andrea Arcangeli

On Fri, Jan 19, 2001 at 08:52:53PM +0300, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > I thought setsockopt is meant to set an option in the socket,
> 
> It is not.

The manpage disagrees with you:

   getsockopt, setsockopt - get and set options on sockets 
 
^^

SIOCPUSH doesn't fit in get/setsockopt, face it.

> > controls the I/O (aka ioctl ;). Anyways either ioctl or setsockopt is fine in
> > pratice
> 
> After BKL is moved down.

The BKL will be moved down in the next months regardless of where we put the
functionality, as said in my original email choosing in function of the BKL in
ioctl VFS is wrong IMHO.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread kuznet

Hello!

> the business about the last 1100ish bytes of a 4096 byte send being
> delayed by nagle only implies that the stack's implementation of nagle
> was broken and interpreting it on a per-segment rather than a per-send
> basis. 
+
> software, or the host TCP stack. otherwise, the persistent connections
> would have worked just fine.

Exactly.


But, actually, there exist the situation (in http-1.1, but not in the nature,
as it is now 8)), when explicit push is required even with ideal nagling.

Look: http-1.1, asynchronous one, the first request is sent, but not acked.
Time to send the second one, but it is blocked by Nagle. If there is no
third request, the pipe stalls. Seems, this situation will be usual,
when http-1.1 will start to be used by clients, because of dependencies
between replys (references) frequently move it to http-1.0 synchronous
mode, but with some data in flight. See?

Solution is evident. On such kind of connections explicit push
must be made as soon as we complete some request _and_ there are no
more pending requests in queue.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread Rick Jones

dean gaudet wrote:
> 
> On Wed, 17 Jan 2001, Rick Jones wrote:
> 
> > > actually the problem isn't nagle...  nagle needs to be turned off for
> > > efficient servers anyhow.
> >
> > i'm not sure I follow that. could you expand on that a bit?
> 
> the problem which caused us to disable nagle in apache is documented in
> this paper .  mind you
> i should personally revisit the paper after all these years so that i can
> reconsider its implications in the context of pipelining and webmux.

ah yes, that - where the web server even for just static content was
providing the replies in more than one send. i would not consider that
to have been an "efficient" server.

i'm not sure that I agree with their statment that piggy-backing is
rarely successful in request/response situations.

the business about the last 1100ish bytes of a 4096 byte send being
delayed by nagle only implies that the stack's implementation of nagle
was broken and interpreting it on a per-segment rather than a per-send
basis. if the app sends 4096 bytes, then there should be no
nagle-induced delays on a connection with an MSS of 4096 or less.

it would seem that in the context of that paper at least, most if not
all of the problems were the result of bugs - either in the webserver
software, or the host TCP stack. otherwise, the persistent connections
would have worked just fine.

> i'm not aware yet of any study in the field.  and i'm out of touch enough
> with the clients that i don't know if new netscape or IE have finally
> begun to use pipelining (they hadn't as of 1998).

someone else sent a private email implying that no browsers were yet
doing pipelining.

rick
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread kuznet

Hello!

> The "uncork" won't push the last skb on the wire if there is not acknowledged
> data in the write_queue and the payload of the last skb in the write_queue
> isn't large MSS. This because the `uncork' will only re-evaluate the
> write_queue in function of the _nagle_ algorithm, quite correctly because the
> "uncork" will move frok "cork" to "nagle" (not from "cork" to "nodelay").

At least for your own implementation of SIOCPUSH has "1" among
arguments of push_pending_frames, so that this does not happen. 8)8)

The second thing, which makes argument above wrong is that
both classic Nagle and linux-2.4 Nagle (extended by Minshall),
do not have this problem. Your argument applies to buggy flavor
of nagling specific to 2.2.


However, SIOCPUSH really affects latency badly in some curcumstances.
Actually, Ingo already learned this from bad experience with TUX. 8)
Namely, when push cannot push anything due to congestion window
or receiver window, http/1.0 style synchronous connections suffer.
The lesson is that when reply is better to be disabled.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread kuznet

Hello!

> I thought setsockopt is meant to set an option in the socket,

It is not.

setsockopt() is simply a bit more clever extension to ioctl(),
which is adapted (in bsd style though) to understand layering
and has an explicit length to data.

It is prefered for all the operations on sockets,
and it is "must", if argument is not plain integer or operation
is specific to some protocol layer.


> controls the I/O (aka ioctl ;). Anyways either ioctl or setsockopt is fine in
> pratice

After BKL is moved down.



> NAGLE algorithm is only one, CORK algorithm is another different algorithm.

It is one algorithm. They differ only by amount of incomplete segments
allowed to be in flight. I.e. (in order of increased latency):

"nodelay" - Infinity
unnamed   - > 1
"nagle"   - 1
"cork"- 0


Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-19 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 11:18:48PM +0100, Ingo Molnar wrote:
> 
> On Thu, 18 Jan 2001, Andrea Arcangeli wrote:
> 
> > This is a possible slow (but userspace based) implementation of SIOCPUSH:
> 
> of course this is what i meant. Lets stop wasting time on this, ok?

We were both wrong. Not even my pseudocode was equivalent to SIOCPUSH 8). Infact
this example 1):

val = 1
setsockopt(TCP_CORK, &val)

sendfile()
sendfile()
write()
write()
whatever()

val = 0
setsockopt(TCP_CORK, &val)

is _not_ equivalent to this example 2):

val = 1
setsockopt(TCP_CORK, &val)

sendfile()
sendfile()
write()
write()
whatever()

val = 0
setsockopt(TCP_CORK, &val)
ioctl(SIOCPUSH)

We were wrong because the "uncork" doesn't do what we all expected from it.

The "uncork" won't push the last skb on the wire if there is not acknowledged
data in the write_queue and the payload of the last skb in the write_queue
isn't large MSS. This because the `uncork' will only re-evaluate the
write_queue in function of the _nagle_ algorithm, quite correctly because the
"uncork" will move frok "cork" to "nagle" (not from "cork" to "nodelay").

Infact this below ugly 3) is finally equivalent to 2) and that's what we
all expected from 1):

val = 1
setsockopt(TCP_CORK, &val)

sendfile()
sendfile()
write()
write()
whatever()

val = 0
setsockopt(TCP_CORK, &val)
val = 1
setsockopt(TCP_NODELAY, &val)
val = 0
setsockopt(TCP_NODELAY, &val)

The uncork didn't do what we all expected but I don't consider it a bug in the
uncork itself because reprocessing the write queue with nagle makes perfect
sense (the socket effectively is in nalge mode, not in nodelay mode), and with
SIOCPUSH we don't need to hack the uncork semantics to avoid having to enter
nodelay mode just to push the last skb in the writequeue into the wire, so
there's no pratical problem (the socket will remain corked or nagled all the
time anyways).

So in short I just wanted to clarify that the "uncorking" doesn't "push" the
write_queue into the wire ASAP as SIOCPUSH does, but it only ensures that the
queued data will arrive to the other end "eventually". SIOCPUSH is instead the
right way to notify the stack it doesn't worth to wait for new outgoing packets
caming from usersapce.

However my implementation of SIOCPUSH of yesterday isn't completly right yet,
my one works fine only if we can send the last fragment immediatly (our cwnd
and receiver advertised window may forbid that).

The right way to implement the SIOCPUSH is to add a flag into the `struct
tcp_opt' tp->push. It works this way: tp->push is set by SIOCPUSH if something
is left in the write_queue after re-evaluating the write_queue in nodelay mode as
my patch just does.  Then the checks on the write queue triggered by the
incoming acks will also check tp->push and they will consider tp->nonagle set
as 1 (nodelay) if tp->push is set. The first send on the socket that arrives
from userspace will clear tp->push. Then SIOCPUSH will work fine with the
semantics we expect.

I'm too busy with other stuff to spend more time than this on the TCP stack
right now, so if somebody of the TCP folks wants to reimplement SIOCPUSH
correctly (as described above) I'm fine. If you can wait I will do that myself
in a few weeks.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread David Ford

dean gaudet wrote:

> the reason, gag puke, is for doing things such as sending "activity"
> progress -- like a line at a time or whatever to indicate that the CGI is
> there and still working.

I understand the gagging on this and generally I agree.  I do appreciate having
the ability to do this however.  In some very infrequent cases it is nice to
have.  Perhaps a distinguished method to accomplish this can be created so only
this method does this while all others become more efficient and data transfer?

-d


--
..NOTICE fwd: fwd: fwd: type emails will be deleted automatically.
  "There is a natural aristocracy among men. The grounds of this are
  virtue and talents", Thomas Jefferson [1742-1826], 3rd US President



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread dean gaudet

ok i'm spouting lots today sorry! :)

can i just say this solves even more problems?

there's a problem with the current apache pipeline code where if a
pipeline consists of, say, a light request followed by a heavy request. a
"light" request is say, a static file, something that essentially is
served immediately.  a "heavy" request could be a dynamic request which
has to go off and fetch something from a distant database.

the heavy request may take several seconds to get the info and form its
response.  all that time apache is delaying the final packet of the first
light request *hoping* that it'll be able to form a nice big buffer for
the kernel.

with SIOCPUSH and nagle this problem disappears.  apache can make sure to
write() to the kernel at the end of each request, no matter how small.
it will SIOCPUSH the response out if there's nothing more in the pipeline,
otherwise it'll proceed to the next request in the pipeline.  if the next
request is heavy, the kernel will eventually push the last request out
when the nagle timer expires... rather than after several seconds or
whenever the heavy response starts to be written.

damn that's sweet.  now if only i had a portable version of this :)

-dean

On Thu, 18 Jan 2001, dean gaudet wrote:

> huh -- i think with this apache could solve the problem documented in
> heidemann's paper while also leaving nagle on... which would solve the CGI
> dribbler vs. bulk problem i just posted about.
>
> at the end of a request apache would check first if it can get another
> request without blocking; if it would block then it issues a SIOCPUSH and
> drops into poll() waiting for a new request.
>
> this means the final packet of a response isn't ever delayed (which is the
> motivation for turning off nagle); and a multi-request pipeline has
> maximal packets... and a dribbling CGI won't cause as many tiny packets.
>
> this may in fact also eliminate the need for CORK, unless anyone can
> really think of an app that wouldn't even want small packets on the wire
> when the server hasn't sent anything for a while.
>
> i like this one :)
>
> -dean
>
> On Thu, 18 Jan 2001, Andrea Arcangeli wrote:
>
> > diff -urN -X /home/andrea/bin/dontdiff 2.4.1pre8/net/ipv4/tcp.c 
>SIOCPUSH/net/ipv4/tcp.c
> > --- 2.4.1pre8/net/ipv4/tcp.cWed Jan 17 04:02:38 2001
> > +++ SIOCPUSH/net/ipv4/tcp.c Thu Jan 18 19:10:14 2001
> > @@ -671,6 +671,11 @@
> > else
> > answ = tp->write_seq - tp->snd_una;
> > break;
> > +   case SIOCPUSH:
> > +   lock_sock(sk);
> > +   __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk), 1);
> > +   release_sock(sk);
> > +   break;
> > default:
> > return(-ENOIOCTLCMD);
> > };
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread dean gaudet

huh -- i think with this apache could solve the problem documented in
heidemann's paper while also leaving nagle on... which would solve the CGI
dribbler vs. bulk problem i just posted about.

at the end of a request apache would check first if it can get another
request without blocking; if it would block then it issues a SIOCPUSH and
drops into poll() waiting for a new request.

this means the final packet of a response isn't ever delayed (which is the
motivation for turning off nagle); and a multi-request pipeline has
maximal packets... and a dribbling CGI won't cause as many tiny packets.

this may in fact also eliminate the need for CORK, unless anyone can
really think of an app that wouldn't even want small packets on the wire
when the server hasn't sent anything for a while.

i like this one :)

-dean

On Thu, 18 Jan 2001, Andrea Arcangeli wrote:

> diff -urN -X /home/andrea/bin/dontdiff 2.4.1pre8/net/ipv4/tcp.c 
>SIOCPUSH/net/ipv4/tcp.c
> --- 2.4.1pre8/net/ipv4/tcp.c  Wed Jan 17 04:02:38 2001
> +++ SIOCPUSH/net/ipv4/tcp.c   Thu Jan 18 19:10:14 2001
> @@ -671,6 +671,11 @@
>   else
>   answ = tp->write_seq - tp->snd_una;
>   break;
> + case SIOCPUSH:
> + lock_sock(sk);
> + __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk), 1);
> + release_sock(sk);
> + break;
>   default:
>   return(-ENOIOCTLCMD);
>   };

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread dean gaudet

On Thu, 18 Jan 2001, Zach Brown wrote:

> We set TCP_CORK on the socket we handed to external programs that were
> being run via 'site exec' in an ftp server.  It resulted in much nicer
> packets being spit out, especially in the 'ls' case where it likes to
> write() on really goofy boundaries.
>
> [yes, ftp and 'site exec' in particular are far from sexy, but do the same
> with CGI scripts and the world might care :)]

actually in apache we deliberately writev() on (essentially) the same
boundaries the CGI passed to us.

the reason, gag puke, is for doing things such as sending "activity"
progress -- like a line at a time or whatever to indicate that the CGI is
there and still working.

this is obviously something that we really should enable nagle for, and
we've been in a dilemma of whether to nagle or not in this case for a few
years.  we want to not nagle if the CGI is bulk... we want to nagle if the
CGI is a dribbler (because that's what nagle is for).

CORK would probably be wrong for us.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread dean gaudet

btw -- i'd like to point out something which some folks are aware of
already.

pipelining was only part of the answer to fixing HTTP/1.0 network
performance problems.  the real answer is a multiplexing protocol such as
WebMUX .  a MUX protocol is more general
than pipelining and keep-alive.

consider a typical web page with a half-dozen embedded images.  while a
single pipelined HTTP/1.1 connection may have the fastest latency to
loading the final byte of the page + images (see the pipelining paper i
referenced below); it doesn't have the best "useable latency".

"useable latency" is intentionally in quotes because it is a subjective
issue of interface design -- a page which loads the critical images early,
such as the first few passes of navigation images is more "useable" than a
page which loads such navigation tools late.  this is the reason
netscape/ie open 4 or more TCP connections to request images in parallel.

incidentally HTML is lacking a UI hint to tell the browser which images
have the highest priority.  (i believe the pipelining paper talks about
this, so there may be proposals for fixing this already.)

at any rate, with a MUX protocol you can get the benefits of pipelining
with the benefits of opening 4 TCP sessions at once.

WebMUX will probably be where we see more benefit of TCP_CORK/MSG_MORE
than we presently do from pipelining.  but we're talking years still...
HTTP/ng hasn't been finalised (because they're being pretty ambitious);
and nobody has decided to just forge forward and layer HTTP/1.1 on top of
WebMUX yet.  (the subversive in me wants to see WebMUX patches for apache,
squid, and mozilla ;)

-dean

On Thu, 18 Jan 2001, dean gaudet wrote:

> On Wed, 17 Jan 2001, Rick Jones wrote:
>
> > > actually the problem isn't nagle...  nagle needs to be turned off for
> > > efficient servers anyhow.
> >
> > i'm not sure I follow that. could you expand on that a bit?
>
> the problem which caused us to disable nagle in apache is documented in
> this paper .  mind you
> i should personally revisit the paper after all these years so that i can
> reconsider its implications in the context of pipelining and webmux.
>
> > on the topic of pipelining - do the pipelined requests tend to be send
> > or arrive together?
>
> i'm pretty sure the actual use of pipelining is pretty disappointing.
> the work i did in apache preceded the widespread use of HTTP/1.1 and we
> believed it was important to do the "most efficient thing" right out the
> door -- so as to encourage the use of pipelining by proxies in particular.
> the w3c folks, henrik frystyk nielsen in particular, provided most of the
> documentation for this.  their paper is a good read:
> 
>
> > > (the heuristic i use in apache to decide if i need to flush responses in a
> > > pipeline is to look if there are any more requests to read first, and if
> > > there are none then i flush before blocking waiting for new requests.)
> >
> > how often to you find yourself flushing the little bits anyhow?
>
> i'm not aware yet of any study in the field.  and i'm out of touch enough
> with the clients that i don't know if new netscape or IE have finally
> begun to use pipelining (they hadn't as of 1998).
>
> -dean
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread dean gaudet

On Wed, 17 Jan 2001, Rick Jones wrote:

> > actually the problem isn't nagle...  nagle needs to be turned off for
> > efficient servers anyhow.
>
> i'm not sure I follow that. could you expand on that a bit?

the problem which caused us to disable nagle in apache is documented in
this paper .  mind you
i should personally revisit the paper after all these years so that i can
reconsider its implications in the context of pipelining and webmux.

> on the topic of pipelining - do the pipelined requests tend to be send
> or arrive together?

i'm pretty sure the actual use of pipelining is pretty disappointing.
the work i did in apache preceded the widespread use of HTTP/1.1 and we
believed it was important to do the "most efficient thing" right out the
door -- so as to encourage the use of pipelining by proxies in particular.
the w3c folks, henrik frystyk nielsen in particular, provided most of the
documentation for this.  their paper is a good read:


> > (the heuristic i use in apache to decide if i need to flush responses in a
> > pipeline is to look if there are any more requests to read first, and if
> > there are none then i flush before blocking waiting for new requests.)
>
> how often to you find yourself flushing the little bits anyhow?

i'm not aware yet of any study in the field.  and i'm out of touch enough
with the clients that i don't know if new netscape or IE have finally
begun to use pipelining (they hadn't as of 1998).

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Rick Jones

Olivier Galibert wrote:
> 
> On Thu, Jan 18, 2001 at 10:04:28PM +0100, Andrea Arcangeli wrote:
> > NAGLE algorithm is only one, CORK algorithm is another different algorithm. So
> > probably it would be not appropriate to mix CORK and NAGLE under the name
> > "CONTROL_NAGLING", but certainly I agree they could stay together under another
> > name ;).
> 
> TCP_FLOW_CONTROL ?

then folks would think you were controlling the congestion or "classic"
windows. what alal these things do is affect segmentation, so perhaps
TCP_SEGMENT_CONTROL or something to that effect, if anything.

rickjones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Olivier Galibert

On Thu, Jan 18, 2001 at 10:04:28PM +0100, Andrea Arcangeli wrote:
> NAGLE algorithm is only one, CORK algorithm is another different algorithm. So
> probably it would be not appropriate to mix CORK and NAGLE under the name
> "CONTROL_NAGLING", but certainly I agree they could stay together under another
> name ;).

TCP_FLOW_CONTROL ?

  OG.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Rick Jones

[EMAIL PROTECTED] wrote:
> 
> Hello!
> 
> > So if I understand  all this correctly...
> >
> > The difference in ACK generation
> 
> CORK does not affect receive direction and, hence, ACK geneartion.

I was asking how the semantics of cork interacted with piggybacking
ACK's on data flowing the other way. Was I wrong in assuming that the
Linux TCP piggybacks ACKs?

rick
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Andrea Arcangeli wrote:

> BTW, the simmetry between getsockopt/setsockopt further bias how
> SIOCPUSH doesn't fit into the setsockopt options but it fits very well
> into the ioctl categorty instead. There's simply no state one can
> return via getsockopt for the SIOCPUSH functionality. It's not setting
> any option, it's just doing one thing that controls the I/O.

you convinced me. I guess i was just distracted by the common wisdom:
'ioctls are a hack'. But SIOCPUSH *IS* an 'IO control' after all :-)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 10:57:20PM +0100, Ingo Molnar wrote:
> 
> On Thu, 18 Jan 2001, Andrea Arcangeli wrote:
> 
> > > {
> > > int val = 1;
> > > setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
> > >   (char *)&val,sizeof(val));
> > > val = 0;
> > > setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
> > >   (char *)&val,sizeof(val));
> > > }
> > >
> > > differ from what you posted. It does the same in my opinion. Maybe we are
> > > not talking about the same thing?
> >
> > The above is equivalent to SIOCPUSH _only_ if the caller wasn't using either
> > TCP_NODELAY or TCP_CORK.
> 
> why? I can restore whatever state i want, the above is just a mechanizm to

This is a possible slow (but userspace based) implementation of SIOCPUSH:

{
int was_set_tcp_cork, was_set_tcp_nodelay, val

getsockopt(TCP_CORK, &was_set_tcp_cork)
if (was_set_tcp_cork)
val = 0
else if (!was_set_tcp_nodelay)
val = 1
else
return

setsockopt(TCP_CORK, &val)
val = !!val
setsockopt(TCP_CORK, &val)
}

Your one ins't.

BTW, the simmetry between getsockopt/setsockopt further bias how SIOCPUSH
doesn't fit into the setsockopt options but it fits very well into the ioctl
categorty instead. There's simply no state one can return via getsockopt for
the SIOCPUSH functionality. It's not setting any option, it's just doing one
thing that controls the I/O.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Andrea Arcangeli wrote:

> This is a possible slow (but userspace based) implementation of SIOCPUSH:

of course this is what i meant. Lets stop wasting time on this, ok?

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 09:44:57PM +0100, Ingo Molnar wrote:
> why? TCP_CORK is equivalent to MSG_MORE, it's just a different

I thought you agreed it isn't (Linus's example I quoted).

> > Doing PUSH from setsockopt(TCP_CORK) looked obviously wrong because it
> > isn't setting any socket state, [...]
> 
> well, neither is clearing/setting TCP_CORK ...

clearing/setting TCP_CORK is a stateful opertaion, it changes a socket option.

> > and also because the SIOCPUSH has nothing specific with TCP_CORK, as
> > said it can be useful also to flush the last fragment of data pending
> > in the send queue without having to wait all the unacknowledged data
> > to be acknowledged from the receiver when TCP_NODELAY isn't set.
> 
> huh? in what way does the following:
> 
> {
> int val = 1;
> setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
>   (char *)&val,sizeof(val));
> val = 0;
> setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
>   (char *)&val,sizeof(val));
> }
> 
> differ from what you posted. It does the same in my opinion. Maybe we are
> not talking about the same thing?

The above is equivalent to SIOCPUSH _only_ if the caller wasn't using either
TCP_NODELAY or TCP_CORK.

> [this is nitpicking. I'm quite sure all the code uses '1' as the value,
> not 2.]

I'm quite sure too but I will not get suprirsed anymore by getting bugreports
because of such an innocent change ;). Though real reasons are others (I
mentioned the backwards compatibility breakage more as a side note).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Andrea Arcangeli wrote:

> > {
> > int val = 1;
> > setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
> > (char *)&val,sizeof(val));
> > val = 0;
> > setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
> > (char *)&val,sizeof(val));
> > }
> >
> > differ from what you posted. It does the same in my opinion. Maybe we are
> > not talking about the same thing?
>
> The above is equivalent to SIOCPUSH _only_ if the caller wasn't using either
> TCP_NODELAY or TCP_CORK.

why? I can restore whatever state i want, the above is just a mechanizm to
force the push.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 11:52:33AM -0800, Linus Torvalds wrote:
> On Thu, 18 Jan 2001, Ingo Molnar wrote:
> > 
> > i believe a network-conscious application should use MSG_MORE - that has
> > no system-call overhead.
> 
> I think Andrea was thinking more of the case of the anonymous IO
> generator, and having the "controller" program thgat keeps the socket
> always in CORK mode, but uses SIOCPUSH when it doesn't know what teh
> future access patterns will be. 

Yes. Your one is an example where TCP_CORK is necessary to make sure not to
send small packets and where instead MSG_MORE can't help.

> Basically, it could use SIOCPUSH whenever its request queue is empty,
> instead of uncorking (and re-corking when the next request comes in).

Exactly.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 11:37:10PM +0300, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > Doing PUSH from setsockopt(TCP_CORK) looked obviously wrong because it isn't
> > setting any socket state,
> 
> ? 8)

I thought setsockopt is meant to set an option in the socket, something
_stateful_, a PUSH doesn't set any option, it isn't stateful and it _only_
controls the I/O (aka ioctl ;). Anyways either ioctl or setsockopt is fine in
pratice so I've no real problem.

> > and also because the SIOCPUSH has nothing specific
> > with TCP_CORK, as said it can be useful also to flush the last fragment of data
> > pending in the send queue without having to wait all the unacknowledged data to
> > be acknowledged from the receiver when TCP_NODELAY isn't set.
> 
> Andrea, TCP_CORK and TCP_NODELAY is _one_ option, which was split to two
> mostly due to historical reasons. Its real name is TCP_CONTROL_NAGLING
> or something sort of this, only readable. 8)

NAGLE algorithm is only one, CORK algorithm is another different algorithm. So
probably it would be not appropriate to mix CORK and NAGLE under the name
"CONTROL_NAGLING", but certainly I agree they could stay together under another
name ;).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001 [EMAIL PROTECTED] wrote:

> Actually, TUX-1.1 (Ingo, do I not lie, did you not kill this code?)
> does this. It does not ack quickly, when complete request is received
> and still not answered, so that all the redundant acks disappear.

(it's TUX 2.0 meanwhile), and yes, TUX uses it. We speculatively delay ACK
of parsed input packet in the hope of merging it with the first output
packet. If the output frame does not happen for 200 msecs then we send a
standalone ACK to be RFC-conform. This way TUX can do single-packet web
replies for small requests. (well, plus SYN-ACK and FIN-ACK)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread kuznet

Hello!

> So if I understand  all this correctly...
> 
> The difference in ACK generation 

CORK does not affect receive direction and, hence, ACK geneartion.
The problem is that TCP does not know, when full request is received
and it must ack instantly at connection start and after some idle
period to allow client to open congestion window. Hence, it has to send
redundant ACKs.

Some control on this is possible only from level parsing requests,
i.e. from httpd in this case.

Actually, TUX-1.1 (Ingo, do I not lie, did you not kill this code?)
does this. It does not ack quickly, when complete request is received
and still not answered, so that all the redundant acks disappear.

This feature is still unaccessible from user level though.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Andrea Arcangeli wrote:

> Agreed. However since TCP_CORK logic is more generic than MSG_MORE
> [...]

why? TCP_CORK is equivalent to MSG_MORE, it's just a different
representation of the same issue. TCP_CORK needs an extra syscall (in the
case of a push event - which might be rare), the MSG_MORE solution needs
an extra flag (which is merged with other flags in the send() case).

> > i believe it should rather be a new setsockopt TCP_CORK value (or a new
> > setsockopt constant), not an ioctl. Eg. a value of 2 to TCP_CORK could
> > mean 'force packet boundary now if possible, and dont touch TCP_CORK
> > state'.
>

> Doing PUSH from setsockopt(TCP_CORK) looked obviously wrong because it
> isn't setting any socket state, [...]

well, neither is clearing/setting TCP_CORK ...

> and also because the SIOCPUSH has nothing specific with TCP_CORK, as
> said it can be useful also to flush the last fragment of data pending
> in the send queue without having to wait all the unacknowledged data
> to be acknowledged from the receiver when TCP_NODELAY isn't set.

huh? in what way does the following:

{
int val = 1;
setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
(char *)&val,sizeof(val));
val = 0;
setsockopt(req->sock, IPPROTO_TCP, TCP_CORK,
(char *)&val,sizeof(val));
}

differ from what you posted. It does the same in my opinion. Maybe we are
not talking about the same thing?

> Changing the semantics of setsockopt(TCP_CORK, 2) would also break
> backwards compatibility with all 2.[24].x kernels out there.

[this is nitpicking. I'm quite sure all the code uses '1' as the value,
not 2.]

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 10:59:11PM +0300, [EMAIL PROTECTED] wrote:
> Hello!
> 
> > I'm all for TCP_CORK but it has the disavantage of two syscalls for doing the
> 
> MSG_MORE was invented to allow to collapse this to 0 of syscalls. 8)

Yes, I know.

> > A new ioctl on the socket should be able to do that (and ioctl looks ligther
> > than a setsockopt, ok ignoring actually the VFS is grabbing the big lock
> > until we relase it in sock_ioctl, ugly, but I feel good ignoring this fact as
> > it will gets fixed eventually and this is userspace API that will stay longer).
> 
> setsockopt() exists, which does not have the flaw. (SOL_SOCKET, TCP_DOPUSH)
> or something like this. Actually, I would convert TCP_CORK to set of flags

That is ok for me after all, as said I thought ioctl was conceptually the
right place but setsockopt TCP_PUSH or TCP_DOPUSH are certainly fine too in
practice.

> (1 is reserved for current corking), but I feel this operation is more generic
> and should be moved to SOL_SOCKET level.

Agreed. (btw everything != 0 is reseved for current corking as far as the code
is concerned ;)

> BTW I see no reasons not to move BKL down for ioctl().

No theorical reason indeed. But pratically moving the BKL down into the
callback means a big patch due the zillon of device drivers out there in and
out the tree and it may end to hurting somebody who is upgrading from 2.4.0 to
2.4.1 and using a driver out of the tree. I believe it's one of the things that
should be done in an unstable branch for those kind of pratical reasons
(however I'm not the one who will complain if that happens during 2.4.x but I'm
not either the one who suggests that because I couldn't complain the
complains ;).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread kuznet

Hello!

> Doing PUSH from setsockopt(TCP_CORK) looked obviously wrong because it isn't
> setting any socket state,

? 8)


> and also because the SIOCPUSH has nothing specific
> with TCP_CORK, as said it can be useful also to flush the last fragment of data
> pending in the send queue without having to wait all the unacknowledged data to
> be acknowledged from the receiver when TCP_NODELAY isn't set.

Andrea, TCP_CORK and TCP_NODELAY is _one_ option, which was split to two
mostly due to historical reasons. Its real name is TCP_CONTROL_NAGLING
or something sort of this, only readable. 8)


> Changing the semantics of setsockopt(TCP_CORK, 2) would also break backwards
> compatibility with all 2.[24].x kernels out there.

2nd ? 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Linus Torvalds wrote:

> I think Andrea was thinking more of the case of the anonymous IO
> generator, and having the "controller" program thgat keeps the socket
> always in CORK mode, but uses SIOCPUSH when it doesn't know what teh
> future access patterns will be.

yep.

> Again, the actual data _senders_ may not be aware of the network
> issues. They are the worker bees, and they may not know or care that
> they are pushing out data to the network.

yep.

> Ingo, you should realize that people actually _want_ to use things
> like stdio. [...]

yep, i already acknowledged that not all applications want to care about
issues like that and rather want to have a 'default behavior' - ie. a
persistent cork.

i also said that user-space (ie. libc) could maintain a persistent flag
itself (a user-space variable) much cheaper than the kernel, and could
pass the current 'more' value to the kernel, whenever sendmsg is done. I
understand that normal file IO has no 'flag' for MSG_MORE - a pity that no
extra flags can be passed in to write(). But this doesnt make it right. It
makes it a practical problem, it shows the (perhaps-) weakness of the file
API which is right now not prepared to pass 'streaming related info' along
with a send, but doesnt make it right.

now if your point is that passing a flag (or flags) along every (generic)
file-write would be a mistake, that would be a point. But you didnt say
that so far.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 08:43:47PM +0100, Ingo Molnar wrote:
> 
> On Thu, 18 Jan 2001, Andrea Arcangeli wrote:
> 
> > I'm all for TCP_CORK but it has the disavantage of two syscalls for
> > doing the flush of the outgoing queue to the network. And one of those
> > two syscalls is spurious. [...]
> 
> i believe a network-conscious application should use MSG_MORE - that has
> no system-call overhead.

Agreed. However since TCP_CORK logic is more generic than MSG_MORE and so it
still makes sense for some usage I think it worth to optimize the TCP_CORK
logic too and this new functionality _may_ be useful not just for TCP_CORK.

> > +   case SIOCPUSH:
> > +   lock_sock(sk);
> > +   __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk), 1);
> > +   release_sock(sk);
> > +   break;
> 
> i believe it should rather be a new setsockopt TCP_CORK value (or a new
> setsockopt constant), not an ioctl. Eg. a value of 2 to TCP_CORK could
> mean 'force packet boundary now if possible, and dont touch TCP_CORK
> state'.

Doing PUSH from setsockopt(TCP_CORK) looked obviously wrong because it isn't
setting any socket state, and also because the SIOCPUSH has nothing specific
with TCP_CORK, as said it can be useful also to flush the last fragment of data
pending in the send queue without having to wait all the unacknowledged data to
be acknowledged from the receiver when TCP_NODELAY isn't set.

Changing the semantics of setsockopt(TCP_CORK, 2) would also break backwards
compatibility with all 2.[24].x kernels out there.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread kuznet

Hello!

> Ingo, you should realize

Ingo did not try to argue. I do not too.

This is right, no doubts.


> Mantra: "everything is a stream of bytes". Repeat until enlightened.

... but devil invented record marks and pushes, seduced mankind
and we was evicted from the paradise. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread kuznet

Hello!

> I'm all for TCP_CORK but it has the disavantage of two syscalls for doing the

MSG_MORE was invented to allow to collapse this to 0 of syscalls. 8)


> A new ioctl on the socket should be able to do that (and ioctl looks ligther
> than a setsockopt, ok ignoring actually the VFS is grabbing the big lock
> until we relase it in sock_ioctl, ugly, but I feel good ignoring this fact as
> it will gets fixed eventually and this is userspace API that will stay longer).

setsockopt() exists, which does not have the flaw. (SOL_SOCKET, TCP_DOPUSH)
or something like this. Actually, I would convert TCP_CORK to set of flags
(1 is reserved for current corking), but I feel this operation is more generic
and should be moved to SOL_SOCKET level.

BTW I see no reasons not to move BKL down for ioctl().
It is not a rocket science, plain dumb edit.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Linus Torvalds



On Thu, 18 Jan 2001, Ingo Molnar wrote:
> 
> i believe a network-conscious application should use MSG_MORE - that has
> no system-call overhead.

I think Andrea was thinking more of the case of the anonymous IO
generator, and having the "controller" program thgat keeps the socket
always in CORK mode, but uses SIOCPUSH when it doesn't know what teh
future access patterns will be. 

Basically, it could use SIOCPUSH whenever its request queue is empty,
instead of uncorking (and re-corking when the next request comes in).

Again, the actual data _senders_ may not be aware of the network issues.
They are the worker bees, and they may not know or care that they are
pushing out data to the network. 

Ingo, you should realize that people actually _want_ to use things like
stdio. Not everything is a Tux web-server. There are specialized servers
out there, and there are tons of people who prefer to use "fprintf()"
and letting the library handle buffering etc.

Very few people want to use send() directly.

Mantra: "everything is a stream of bytes". Repeat until enlightened.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Linus Torvalds



On Thu, 18 Jan 2001, Andrea Arcangeli wrote:
> 
> I'm all for TCP_CORK but it has the disavantage of two syscalls for doing the
> flush of the outgoing queue to the network. And one of those two syscalls is
> spurious. Certainly it makes perfect sense that the uncork flushes the outgoing
> queue, but I think we should have a way to flush it without exiting the cork-mode.
> I believe most software only needs to SIOCPUSH after sending the data and just
> before waiting the reply.

Sure, I agree. Something like SIOCPUSH would fit very well into the
TCP_CORK mentality.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andi Kleen

On Thu, Jan 18, 2001 at 10:20:16AM -0800, Rick Jones wrote:
> Andi Kleen wrote:
> > 
> > On Wed, Jan 17, 2001 at 02:17:36PM -0800, Rick Jones wrote:
> > > How does CORKing interact with ACK generation? In particular how it
> > > might interact with (or rather possibly induce) standalone ACKs?
> > 
> > It doesn't change the ACK generation. If your cork'ed packets gets sent
> > before the delayed ack triggers it is piggy backed, if not it is send
> > individually. When the delayed ack triggers depends; Linux has dynamic
> > delack based on the rtt and also a special quickack mode to speed up slow
> > start.
> 
> So if I understand  all this correctly...
> 
> The difference in ACK generation would be that with nagle it is a race
> between the standalone ack heuristic and the first byte of response
> data, with cork, the race is between the standalone ack heuristic and
> the last byte of response data and an uncork call, or the MSSth byte
> whichever comes first.

For the case of the cork'ed packet being at the beginning of the connection
(as http/1.0) then cork will not help much, because quickack will send an
ack immediately, but the write only occurs after the process got woken up.

For later cases it depends on the ack state -- 2.4 added more complicated
ack state (e.g "pingpong mode") to fix a few performance problems with the 
normal delack in uncommon situations.  In pingpong mode ack happens very
quickly, too fast for cork. 

I suspect at least persistent HTTP will be always affected by one of these
and generate more acks.




-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Andrea Arcangeli wrote:

> I'm all for TCP_CORK but it has the disavantage of two syscalls for
> doing the flush of the outgoing queue to the network. And one of those
> two syscalls is spurious. [...]

i believe a network-conscious application should use MSG_MORE - that has
no system-call overhead.

> + case SIOCPUSH:
> + lock_sock(sk);
> + __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk), 1);
> + release_sock(sk);
> + break;

i believe it should rather be a new setsockopt TCP_CORK value (or a new
setsockopt constant), not an ioctl. Eg. a value of 2 to TCP_CORK could
mean 'force packet boundary now if possible, and dont touch TCP_CORK
state'.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andrea Arcangeli

On Thu, Jan 18, 2001 at 08:49:38AM -0800, Linus Torvalds wrote:
> state. However, the fact is that you _need_ the persistency of a socket
> option if you want to take advantage of external programs etc getting good
> behaviour without having to know that they are talking to a socket. 

I'm all for TCP_CORK but it has the disavantage of two syscalls for doing the
flush of the outgoing queue to the network. And one of those two syscalls is
spurious. Certainly it makes perfect sense that the uncork flushes the outgoing
queue, but I think we should have a way to flush it without exiting the cork-mode.
I believe most software only needs to SIOCPUSH after sending the data and just
before waiting the reply.

A new ioctl on the socket should be able to do that (and ioctl looks ligther
than a setsockopt, ok ignoring actually the VFS is grabbing the big lock
until we relase it in sock_ioctl, ugly, but I feel good ignoring this fact as
it will gets fixed eventually and this is userspace API that will stay longer).

[ I ignored the ioctl #define mess so it can't compile ]

diff -urN -X /home/andrea/bin/dontdiff 2.4.1pre8/net/ipv4/tcp.c SIOCPUSH/net/ipv4/tcp.c
--- 2.4.1pre8/net/ipv4/tcp.cWed Jan 17 04:02:38 2001
+++ SIOCPUSH/net/ipv4/tcp.c Thu Jan 18 19:10:14 2001
@@ -671,6 +671,11 @@
else
answ = tp->write_seq - tp->snd_una;
break;
+   case SIOCPUSH:
+   lock_sock(sk);
+   __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk), 1);
+   release_sock(sk);
+   break;
default:
return(-ENOIOCTLCMD);
};

The SIOCPUSH makes sense and it "may" be useful also when not using TCP_CORK
(only with nagle algorithm): it allows the sender to push into the network the
last frame of the message into the wire without waiting the not yet
acknowledged data to be acknowledged (and without having to disable the nagle
algorithm on the socket to achieve that).

SIOCPUSH is a noop if the user set TCP_NODELAY in the socket options indeed.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Linus Torvalds



On Thu, 18 Jan 2001, Rick Jones wrote:

> Linus Torvalds wrote:
> > Remember the UNIX philosophy: everything is a file.
> 
> ...and a file is simply a stream of bytes (iirc?)

Indeed.

And normal applications really shouldn't need to worry about things like
packetization etc.

Of course, many applications still do. stdio does "fstat" on the file
descriptor to get the st_blksize thing - which despite it's name is really
only meant to say "this is an efficient blocksize to write to this fd".
That only really works for regular files, and is just a heuristic even
there.

But TCP_CORK can be used to kind of "wrap" such applications, if you know
that they don't have interactive behaviour. 

99% of the time you probably don't care enough. Not very many people use
TCP_CORK, I suspect. It's too Linux-specific, and you really have to watch
the packets on the network to see the effect of it (unless you use it
wrong, and forget to uncork, in which case you can certainly see the
effect of it the wrong way ;)

Oh, well. The same is obviously largely true of "sendfile()" in general.
The people who use senfile() under Linux are probably largely the same
people who know about and use TCP_CORK.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Rick Jones

Linus Torvalds wrote:
> Remember the UNIX philosophy: everything is a file.

...and a file is simply a stream of bytes (iirc?)

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Rick Jones

Ingo Molnar wrote:
> 
> On Wed, 17 Jan 2001, Rick Jones wrote:
> 
> > i'd heard interesting generalities but no specifics. for instance,
> > when the send is small, does TCP wait exclusively for the app to
> > flush, or is there an "if all else fails" sort of timer running?
> 
> yes there is a per-socket timer for this. According to RFC 1122 a TCP
> stack 'MUST NOT' buffer app-sent TCP data indefinitely if the PSH bit
> cannot be explicitly set by a SEND operation. Was this a trick question?
> :-)

Nope, not a trick question. The nagle heuristic means that small sends
will not wait indefinitely since sending the first small bit of data
starts the retransmission timer as a course of normal processing. So, I
am not in the habit of thinking about a "clear the buffer" timer being
set when a small send takes place but no transmit happens.

rick jones

btw, as I'm currently on linux-kernel, no need to cc me :)
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Rick Jones

Andi Kleen wrote:
> 
> On Wed, Jan 17, 2001 at 02:17:36PM -0800, Rick Jones wrote:
> > How does CORKing interact with ACK generation? In particular how it
> > might interact with (or rather possibly induce) standalone ACKs?
> 
> It doesn't change the ACK generation. If your cork'ed packets gets sent
> before the delayed ack triggers it is piggy backed, if not it is send
> individually. When the delayed ack triggers depends; Linux has dynamic
> delack based on the rtt and also a special quickack mode to speed up slow
> start.

So if I understand  all this correctly...

The difference in ACK generation would be that with nagle it is a race
between the standalone ack heuristic and the first byte of response
data, with cork, the race is between the standalone ack heuristic and
the last byte of response data and an uncork call, or the MSSth byte
whichever comes first.

If the response bytes are dribbling slowly into the socket, where slowly
is less than the bandwidth delay product of the connection, cork can
result in quite fewer packets than nagle would. It would perhaps though
have one more standalone ACK than nagle

If the response bytes are dribbling quickly into the socket, where
quickly is greater than the bandwidth delay product of the connection,
cork will produce one less packet than nagle.

If the response bytes go into the socket together, cork and nagle will
produce the same number of packets.

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Zach Brown

On Thu, Jan 18, 2001 at 08:49:38AM -0800, Linus Torvalds wrote:

> That has its advantages: it's a very local thing, and doesn't need any
> state. However, the fact is that you _need_ the persistency of a socket
> option if you want to take advantage of external programs etc getting good
> behaviour without having to know that they are talking to a socket. 

*nod*

We set TCP_CORK on the socket we handed to external programs that were
being run via 'site exec' in an ftp server.  It resulted in much nicer
packets being spit out, especially in the 'ls' case where it likes to
write() on really goofy boundaries.

[yes, ftp and 'site exec' in particular are far from sexy, but do the same
with CGI scripts and the world might care :)]

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Linus Torvalds wrote:

> Remember the UNIX philosophy: everything is a file. MSG_MORE
> completely breaks that, because the only way to use it is with
> send[msg](). It's absolutely unusable with something like a
> traditional UNIX "anonymous" application that doesn't know or care
> that it's writing to the network.

yep you are right - i only thought in terms of applications that know that
they are dealing with a network.

> In contrast, TCP_CORK has an interface much like TCP_NOPUSH, along
> with the notion of persistency. The difference between those two is
> that TCP_CORK really took the notion of persistency to the end, and
> made uncorking actually say "Ok, no more packets". You can't do that
> with TCP_NOPUSH: with TCP_NOPUSH you basically have to know what your
> last write is, and clear the bit _before_ that write if you want to
> avoid bad latencies (alternatively, you can just close the socket,
> which works equally well, and was probably the designed interface for
> the thing. That has the disadvantage of, well, closing the socket - so
> it doesn't work if you don't _know_ whether you'd write more or not).

i believe BSD's TCP_NOPUSH should add those 3 lines that are needed to
flush pending packets, this is what we do too - we do a
tcp_push_pending_frames() if the socket option TCP_CORK is cleared.

> So the three are absolutely not equivalent. I personally think that
> TCP_NOPUSH is always the wrong thing - it has the persistency without
> the ability to shut it off gracefully after the fact. In contrast,
> both MSG_MORE and TCP_CORK have well-defined behaviour but they have
> very different uses.

yep - i agree now. In terms of network-aware applications, i found
MSG_MORE to be both cheaper and less bug-prone - but for uncooperative (or
simply too generic) applications which are output-ing to simple files
there is no way to control buffering, only some persistent mechanizm.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Thu, 18 Jan 2001, Linus Torvalds wrote:

> Yeah, and how are you going to teach a perl CGI script that writes to
> stdout to use it?

yep, correct. But you can have TCP_CORK behavior from user-space (by
setting the cork flag in user-space and writing it for all network
output), while you cannot have MSG_MORE in the TCP_CORK case. And a perl
script will likely use none of these mechanizms, it's the webserver CGI
host code that does the network send, perl CGI scripts do not send to the
network directly, they send to a pipe so the CGI host code can have
absolute control over eg. CGI-generated HTTP headers.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Linus Torvalds



On Thu, 18 Jan 2001, Ingo Molnar wrote:
> 
> Basically MSG_MORE is a simplified probability distribution of the next
> SEND, and it already covers all the other (iovec, nagle, TCP_CORK)
> mechanizm available, in a surprisingly easy way IMO. I believe MSG_MORE is
> very robust from a theoreticaly point of view.

Yeah, and how are you going to teach a perl CGI script that writes to
stdout to use it?

Face it, it's limited. It has, in fact, many of the same limitations
TCP_NOPUSH has.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Linus Torvalds



On Thu, 18 Jan 2001, Ingo Molnar wrote:
>
> [ BSD's TCP_NOPUSH ] 
>
> this is what MSG_MORE does.Basically i added MSG_MORE for the purpose of
> getting perfect TUX packet boundaries (and was ignorant enough to not know
> about BSD's NOPUSH), without an additional system-call overhead, and
> without the persistency of TCP_CORK. Alexey and David agreed, and actually
> implemented it correctly :-)

MSG_MORE is very different from TCP_NOPUSH, which is very different from
TCP_CORK.

First off, the interfaces are very different. MSG_MORE is a "this write
will be followed by more writes", and only works on programs that know
that they are writing to a socket.

That has its advantages: it's a very local thing, and doesn't need any
state. However, the fact is that you _need_ the persistency of a socket
option if you want to take advantage of external programs etc getting good
behaviour without having to know that they are talking to a socket. 

Remember the UNIX philosophy: everything is a file. MSG_MORE completely
breaks that, because the only way to use it is with send[msg](). It's
absolutely unusable with something like a traditional UNIX "anonymous"
application that doesn't know or care that it's writing to the network.

So while MSG_MORE has uses, it's absolutely and utterly wrong to say that
it is equivalent to either TCP_NOPUSH or TCP_CORK.

Now, I'll agree that TCP_NOPUSH actually has the same _logic_ as MSG_MORE:
you can basically say that the two are more or less equivalent by a source
transformation (ie send(MSG_MORE) => "set TCP_NOPUSH + send() + clear
TCP_NOPUSH". Both of them are really fairly "local", but TCP_NOPUSH has a
_notion_ of persistency that is entirely lacking in MSG_MORE.

In contrast, TCP_CORK has an interface much like TCP_NOPUSH, along with
the notion of persistency. The difference between those two is that
TCP_CORK really took the notion of persistency to the end, and made
uncorking actually say "Ok, no more packets". You can't do that with
TCP_NOPUSH: with TCP_NOPUSH you basically have to know what your last
write is, and clear the bit _before_ that write if you want to avoid bad
latencies (alternatively, you can just close the socket, which works
equally well, and was probably the designed interface for the thing. That
has the disadvantage of, well, closing the socket - so it doesn't work if
you don't _know_ whether you'd write more or not).

So the three are absolutely not equivalent. I personally think that
TCP_NOPUSH is always the wrong thing - it has the persistency without the
ability to shut it off gracefully after the fact. In contrast, both
MSG_MORE and TCP_CORK have well-defined behaviour but they have very
different uses.

Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andi Kleen

On Thu, Jan 18, 2001 at 02:06:46PM +0100, Ingo Molnar wrote:
> 
> On Wed, 17 Jan 2001, Rick Jones wrote:
> 
> > i'd heard interesting generalities but no specifics. for instance,
> > when the send is small, does TCP wait exclusively for the app to
> > flush, or is there an "if all else fails" sort of timer running?
> 
> yes there is a per-socket timer for this. According to RFC 1122 a TCP
> stack 'MUST NOT' buffer app-sent TCP data indefinitely if the PSH bit
> cannot be explicitly set by a SEND operation. Was this a trick question?
> :-)

Are you sure? The retransmit timer is not necessarily started and I don't 
see any other timer in 2.2 or plain 2.4 that would do that that.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Wed, 17 Jan 2001, Rick Jones wrote:

> certainly, i see by your examples how cork can make life easier on the
> developer - they can putc() the reply if they want. for a persistent
> http connection, there would be the cork and uncork each time, for a
> pipelined connection, it is basically a race - how does the client
> present requests to the connection, what are the speeds of that
> connection relative to the speed of the server getting replies into
> the socket that sort of thing.

such dynamic properties should IMO never become visible to user-space
interfaces i believe. TCP_CORK/MSG_MORE (which are both the same thing, in
a different interface) are a way to specify logical neighborhood of
app-side SENDs. I believe the most sensible and generic thing to do is to
require MSG_MORE information from the application: 'is it likely that the
application is going to SEND something soon, or not?'.

Submitting an exact timetable of planned future SENDs (with a fully
specified probability distribution of every expected future SEND event)
would be the most informative thing to do, but this is not very practical.

Basically MSG_MORE is a simplified probability distribution of the next
SEND, and it already covers all the other (iovec, nagle, TCP_CORK)
mechanizm available, in a surprisingly easy way IMO. I believe MSG_MORE is
very robust from a theoreticaly point of view.

To use this information to judge saturation situations properly is
completely up to the stack.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Wed, 17 Jan 2001, Linus Torvalds wrote:

> (I also had one person point out that BSD's have the notion of
> TCP_NOPUSH, which does almost what TCP_CORK does under Linux, except
> it doesn't seem to have the notion of uncorking - you can turn NOPUSH
> off, but apparently it doesn't affect queued packets. This makes it
> even less clear why they have the ugly sendfile)

this is what MSG_MORE does. Basically i added MSG_MORE for the purpose of
getting perfect TUX packet boundaries (and was ignorant enough to not know
about BSD's NOPUSH), without an additional system-call overhead, and
without the persistency of TCP_CORK. Alexey and David agreed, and actually
implemented it correctly :-)

basically if MSG_MORE is not set that means an explicit packet boundary in
the noncontended scenario. If MSG_MORE is set then that means all full-MSS
packets are queued, partial packets are not queued (but are timing out).
sendfile() uses the more flag internally - i've changed sendfile() in my
tree to specify the more flag from higher levels as well - eg. if a sent
file is embedded into other replies, or multiple files are sent.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Wed, 17 Jan 2001, Rick Jones wrote:

> i'd heard interesting generalities but no specifics. for instance,
> when the send is small, does TCP wait exclusively for the app to
> flush, or is there an "if all else fails" sort of timer running?

yes there is a per-socket timer for this. According to RFC 1122 a TCP
stack 'MUST NOT' buffer app-sent TCP data indefinitely if the PSH bit
cannot be explicitly set by a SEND operation. Was this a trick question?
:-)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Ingo Molnar


On Wed, 17 Jan 2001, dean gaudet wrote:

> with the linux TCP_CORK API you only get one trailing small packet.
> in case you haven't heard of TCP_CORK -- when the cork is set, the
> kernel is free to send any maximum size packets it can form, but has
> to hold on to the stragglers until userland gives it more data or pops
> the cork.

TCP_CORK has been basically replaced by MSG_MORE these days. The problem
with the cork approach is that it's a persistent socket flag - and it
easily triggers programming errors when the application writer tracks the
state of the cork incorrectly. Also, removing the cork is one extra
system-call. So what you can do with MSG_MORE is to specify at
sendmsg()/writev() time, whether at the end of the buffer there is a cork
or not.

this is what TUX uses. When a eg. static HTTP request arrives it sends
reply headers shortly after having checked file permissions and stuff (but
the file is not yet sent), with MSG_MORE set. Then it sends the file, and
sendfile() keeps MSG_MORE set right until the end of the request, when it
clears it for the last fragment so the last partial packet gets flushed to
the network. In fact there is one more optimization here, if the request
is not keepalive then TUX still kees MSG_MORE set, and closes the socket -
which will implicitly flush the output queue anyway and send any partial
packet, but will also have the FIN packet merged with the last outgoing
packet.

(if there is saturation then further merging might happen as well - if a
sendmsg() comes before a partial, but already xmit-queued packet is sent,
then the TCP layer merges this sendmsg() output with the outgoing packet.)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-18 Thread Andi Kleen

On Wed, Jan 17, 2001 at 02:17:36PM -0800, Rick Jones wrote:
> How does CORKing interact with ACK generation? In particular how it
> might interact with (or rather possibly induce) standalone ACKs?

It doesn't change the ACK generation. If your cork'ed packets gets sent
before the delayed ack triggers it is piggy backed, if not it is send 
individually. When the delayed ack triggers depends; Linux has dynamic
delack based on the rtt and also a special quickack mode to speed up slow 
start. 


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Jonathan Walther

-BEGIN PGP SIGNED MESSAGE-

They have it because they heard Linux had it and they wanted to do
well in the next round of Mindcraft-like benchmarks. (sendfile() that is)

Jonathan

On Wed, 17 Jan 2001, Linus Torvalds wrote:
> (I also had one person point out that BSD's have the notion of TCP_NOPUSH,
> which does almost what TCP_CORK does under Linux, except it doesn't seem
> to have the notion of uncorking - you can turn NOPUSH off, but apparently
> it doesn't affect queued packets. This makes it even less clear why they
> have the ugly sendfile)

-BEGIN PGP SIGNATURE-
Version: 2.6.3ia
Charset: noconv

iQCVAwUBOmYgVsK9HT/YfGeBAQGI7wP+OLEirJP8SEuBQuE5Rm8pMzsZxrLUI4Ei
Ilk4T8B8ZoVBjHfftPpX47ra8ZcmJu+pQWRGleVCUtjfqQS6JUPTaLC2PSJRurer
5+6tB1BOzzD31W7eAoZTtcn1rQvhkG3agoXo5MWM4hVrqUoI5hv+L/qUiKlXqIyq
dZD5e7aITDs=
=OzuY
-END PGP SIGNATURE-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Linus Torvalds



On Wed, 17 Jan 2001, Rick Jones wrote:
> > 
> >  (a) make sure that system call latency is low enough that there really
> >  aren't any major reasons to avoid system calls. They're just function
> >  calls - they may be a bit heavier than most functions, of course, but
> >  people shouldn't need to avoid them like the plague like on some
> >  systems.
> 
> i'm not quite sure how it plays here, but someone once told me that the
> most efficient procedure call was the one that was never made :)

Absolutely.

But I'm also a firm believer in "simplicity makes performance". 

My personal problem (and maybe it really is just me) with sendmgs() and
writev() kind of scatter-gather interfaces is that I think they are hard
and non-intuitive to use. They work beautifully if you design with them in
mind, and your data really is fundamentally already laid out in memory.

But they tend to be a bit too complicated if you have to do things like
"sprintf()" to generate part of the data first, and if you don't know
where you'll get your data before it is generated etc. For example, the
whole writev()/sendfile() kind of approach just _totally_ breaks down when
you have things like CGI involved.

Basically, I think the scatter-gather interfaces are too inflexible: they
are designed for one thing, and one thing only, and it's hard to use them
for anything else. And being hard to use means that people will do
non-obvious things, or just ignore them. Both of which will be bad for
performance in the long run. If you try to be clever, the program gets
harder to maintain, and because of that you can't do the good kinds of
re-organizations that might improve it.

The true power of TCP_CORK is when you really start thinking about what it
means that you can do _any_ output. Suddenly, you can have perl CGI stuff,
that uses stdio or something even more primitive that doesn't do buffering
at all - and it will automatically look ok on the wire.

> How "bulk" is a bulk transfer in your thinking? By the time the transfer
> gets above something like 100*MSS I would think that the first small
> packet would become epsilon. 

Actually, I don't really mean "bulk" as in "big", but more as in
"noninteractive". The biggest advantage of things like TCP_CORK is exactly
for small files or smallish CGI output, where it makes a difference
whether you sent out 4 big packets or 5 half-sized packets.

> How does CORKing interact with ACK generation? In particular how it
> might interact with (or rather possibly induce) standalone ACKs?

If anything, it should reduce ACK's too, simply because it reduces the
number of packets. But with most people doing delayed ACKs for every 2 MSS
of data (or whatever the RFC's specify), this is probably not really much
of an issue.

> so after i present each reply, i'm checking to see if there is another
> request and if there is not i have to uncork to get the residual data to
> flow.

Another way of thinking about it - you just know when the connection is
idle, and you uncork.

But note that you don't _have_ to be clever, if you don't want to. You can
just uncork after each transfer, and you'll still do no worse than if you
never corked at all. And you'll have all the advantages of being able to
not worry about how your CGI scripts etc work together.

> But does the server know the arrival pattern (well time distribution) of
> requests? It seems that one depends on a client being helpful about
> getting requests to the server in groups otherwise one is corking and
> uncorking the connection.

Oh, best performance definitely depends on the client interleaving the
requests. What else is new?

TCP_CORK is not going to suddenly make your application never have to
think about performance ever again ay more. That's obvious. It is nothing
but a tool in your tool-chest. It's a tool with a very simple interface,
and it's rather generic. Which is why it's so powerful. But it's not a
panacea.

I'm claiming that with TCP_CORK, it's fairly obvious how to write a server
that _can_ take advantage of a pipelined client. 

In contrast, with a writev/sg-sendfile kind of interface it would be much
more painful. You'd have to explicitly buffer up your replies all the
time, which creates much more interesting (read: bug-prone) memory
management issues, AND makes it a real bitch to handle things like
external CGI stuff etc.

But no, let's not claim that TCP_CORK solves the problem of world hunger..

(I also had one person point out that BSD's have the notion of TCP_NOPUSH,
which does almost what TCP_CORK does under Linux, except it doesn't seem
to have the notion of uncorking - you can turn NOPUSH off, but apparently
it doesn't affect queued packets. This makes it even less clear why they
have the ugly sendfile)

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Rick Jones

Linus Torvalds wrote:
> 
> On Wed, 17 Jan 2001, Rick Jones wrote:
> >
> > > The fact that I understand _why_ it is done that way doesn't mean that I
> > > don't think it's a hack. It doesn't allow you to sendfile multiple files
> > > etc without having nagle boundaries, and the header/trailer stuff really
> > > isn't a generic solution.
> >
> > Hmm, I would think that nagle would only come into play if those files
> > were each less than MSS and there were no intervening application level
> > reply/request messages for each.
> 
> It's not the file itself - it's the headers and trailers.

OK, the sum of the header/trailer/file when one calls an HP-UX-style
sendfile(). All that does is make it more likely that one will have
sends larger than the MSS.

>  - the packet boundary between the header and the file you're sending.
> 
> Normally, if you do a separate data "send()" for the header before
> actually using sendfile(), the header would be sent out as one packet,
> while the actual file contents would then get coalesced into MSS-sized
> packets.
> 
> This is why people originally did writev() and sendmsg() - to allow
> people to do scatter-gather without having multiple packets on the
> wire, and letting the OS choose the best packet boundaries, of course.

I prefer to describe it as "presenting logically associated data to the
transport at one time" but that's just wordsmithing.

> So the Linux approach (and, obviously, in my opinion the only right
> approach) is basically to
> 
>  (a) make sure that system call latency is low enough that there really
>  aren't any major reasons to avoid system calls. They're just function
>  calls - they may be a bit heavier than most functions, of course, but
>  people shouldn't need to avoid them like the plague like on some
>  systems.

i'm not quite sure how it plays here, but someone once told me that the
most efficient procedure call was the one that was never made :)

> and
> 
>  (b) TCP_CORK.
> 
> Now, TCP_CORK is basically me telling David Miller that I refuse to play
> games to have good packet size distribution, and that I wanted a way for
> the application to just tell the OS: I want big packets, please wait until
> you get enough data from me that you can make big packets.
> 
> Basically, TCP_CORK is a kind of "anti-nagle" flag. It's the reverse of
> "no-nagle". So you'd "cork" the TCP connection when you know you are going
> to do bulk transfers, and when you're done with the bulk transfer you just
> "uncork" it. At which point the normal rules take effect (ie normally
> "send out any partial packets if you have no packets in flight").

How "bulk" is a bulk transfer in your thinking? By the time the transfer
gets above something like 100*MSS I would think that the first small
packet would become epsilon. 

How does CORKing interact with ACK generation? In particular how it
might interact with (or rather possibly induce) standalone ACKs?

> This is a _much_ better interface than having to play games with
> scatter-gather lists etc. You could basically just do
> 
> int optval = 1;
> 
> setsockopt(sk, SOL_TCP, TCP_CORK, &optval, sizeof(int));
> write(sk, ..);
> write(sk, ..);
> write(sk, ..);
> sendfile(sk, ..);
> write(..)
> printf(...);
> ...any kind of output..
> 
> optval = 0;
> setsockopt(sk, SOL_TCP, TCP_CORK, &optval, sizeof(int));
> 
> and notice how you don't need to worry about _how_ you output the data any
> more. It will automatically generate the best packet sizes - waiting for
> disk if necessary etc.
> 
> With TCP_CORK, you can obviously and trivially emulate the HP-UX behaviour
> if you want to. But you can just do _soo_ much more.
> 
> Imagine, for example, keep-alive http connections. Where you might be
> doing multiple sendfile()'s of small files over the same connection, one
> after the other. With Linux and TCP_CORK, what you can basically do is to
> just cork the connection at the beginning, and then let is stay corked for
> as long as you don't have any outstanding requests - ie you uncork only
> when you don't have anything pending any more.

so after i present each reply, i'm checking to see if there is another
request and if there is not i have to uncork to get the residual data to
flow.

> (The reason you want to uncork at all, is to obviously let the partial
> packets out when you don't know if you'll write anything more in the near
> future. Uncorking is important too.
> 
> Basically, TCP_CORK is useful whenever the server knows the patterns of
> its bulk transfers. Which is just about 100% of the time with any kind of
> file serving.

But does the server know the arrival pattern (well time distribution) of
requests? It seems that one depends on a client being helpful about
getting requests to the server in groups otherwise one is corking and
uncorking the connection. If they are strung-out, the serve

Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Dan Kegel

Dean Gaudet wrote:
> consider the case where you're responding to a pair of pipelined HTTP/1.1 
> requests. with the HPUX and BSD sendfile() APIs you end up forcing a 
> packet boundary between the two responses. this is likely to result in 
> one small packet on the wire after each response. 
> 
> with the linux TCP_CORK API you only get one trailing small packet

Tony Finch tells me that BSD also supports TCP_CORK; in fact, it had it first.
He wrote:
> BSDs that include T/TCP (pretty much all of them since 1995) have an
> option called TCP_NOPUSH which is equivalent to Linux's TCP_CORK. A
> pity the Linux people didn't know about it when they implemented their
> version.
>  
> #if defined(TCP_CORK) && !defined(TCP_NOPUSH)
> #define TCP_NOPUSH TCP_CORK
> #endif

Can anyone verify it resolves the problem Dean pointed out?

Now, Linus, does that make you hate BSD less? :-)
- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Linus Torvalds



On Wed, 17 Jan 2001, Rick Jones wrote:
>
> > The fact that I understand _why_ it is done that way doesn't mean that I
> > don't think it's a hack. It doesn't allow you to sendfile multiple files
> > etc without having nagle boundaries, and the header/trailer stuff really
> > isn't a generic solution.
> 
> Hmm, I would think that nagle would only come into play if those files
> were each less than MSS and there were no intervening application level
> reply/request messages for each.

It's not the file itself - it's the headers and trailers.

The reason you want to have headers and trailers in your sendfile() is
two-fold:

 - if you have high system call latency, it can make a difference.

   This one simply isn't an issue with Linux. System calls are cheap, and
   I'd rather optimize them further than make them uglier. 

 - the packet boundary between the header and the file you're sending.

Normally, if you do a separate data "send()" for the header before
actually using sendfile(), the header would be sent out as one packet,
while the actual file contents would then get coalesced into MSS-sized
packets.

This is why people originally did writev() and sendmsg() - to allow
people to do scatter-gather without having multiple packets on the
wire, and letting the OS choose the best packet boundaries, of course.

So the Linux approach (and, obviously, in my opinion the only right
approach) is basically to 

 (a) make sure that system call latency is low enough that there really
 aren't any major reasons to avoid system calls. They're just function
 calls - they may be a bit heavier than most functions, of course, but
 people shouldn't need to avoid them like the plague like on some
 systems.

and

 (b) TCP_CORK. 

Now, TCP_CORK is basically me telling David Miller that I refuse to play
games to have good packet size distribution, and that I wanted a way for
the application to just tell the OS: I want big packets, please wait until
you get enough data from me that you can make big packets.

Basically, TCP_CORK is a kind of "anti-nagle" flag. It's the reverse of
"no-nagle". So you'd "cork" the TCP connection when you know you are going
to do bulk transfers, and when you're done with the bulk transfer you just
"uncork" it. At which point the normal rules take effect (ie normally
"send out any partial packets if you have no packets in flight").

This is a _much_ better interface than having to play games with
scatter-gather lists etc. You could basically just do

int optval = 1;

setsockopt(sk, SOL_TCP, TCP_CORK, &optval, sizeof(int));
write(sk, ..);
write(sk, ..);
write(sk, ..);
sendfile(sk, ..);
write(..)
printf(...);
...any kind of output..

optval = 0;
setsockopt(sk, SOL_TCP, TCP_CORK, &optval, sizeof(int));

and notice how you don't need to worry about _how_ you output the data any
more. It will automatically generate the best packet sizes - waiting for
disk if necessary etc.

With TCP_CORK, you can obviously and trivially emulate the HP-UX behaviour
if you want to. But you can just do _soo_ much more.

Imagine, for example, keep-alive http connections. Where you might be
doing multiple sendfile()'s of small files over the same connection, one
after the other. With Linux and TCP_CORK, what you can basically do is to
just cork the connection at the beginning, and then let is stay corked for
as long as you don't have any outstanding requests - ie you uncork only
when you don't have anything pending any more.

(The reason you want to uncork at all, is to obviously let the partial
packets out when you don't know if you'll write anything more in the near
future. Uncorking is important too.

Basically, TCP_CORK is useful whenever the server knows the patterns of
its bulk transfers. Which is just about 100% of the time with any kind of
file serving.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Rick Jones


> > Hmm, I would think that nagle would only come into play if those files
> > were each less than MSS and there were no intervening application level
> > reply/request messages for each.
> 
> actually the problem isn't nagle...  nagle needs to be turned off for
> efficient servers anyhow.  

i'm not sure I follow that. could you expand on that a bit?

> but once it's turned off, the standard socket
> API requires (or rather allows) the kernel to flush packets to the wire
> after each system call.

most definitely allows, not requires.

> consider the case where you're responding to a pair of pipelined HTTP/1.1
> requests.  with the HPUX and BSD sendfile() APIs you end up forcing a
> packet boundary between the two responses.  this is likely to result in
> one small packet on the wire after each response.

i _possibly_ have a packet boundary. if the last small bit of the first
file is handed to the transport when there is sufficient clasic and
congestion window to send it or that window "arrives" before the first
chunk of the second file is sent.

on the topic of pipelining - do the pipelined requests tend to be send
or arrive together? 

> with the linux TCP_CORK API you only get one trailing small packet.  in
> case you haven't heard of TCP_CORK -- when the cork is set, the kernel is
> free to send any maximum size packets it can form, but has to hold on to
> the stragglers until userland gives it more data or pops the cork.

i'd heard interesting generalities but no specifics. for instance, when
the send is small, does TCP wait exclusively for the app to flush, or is
there an "if all else fails" sort of timer running?

> (the heuristic i use in apache to decide if i need to flush responses in a
> pipeline is to look if there are any more requests to read first, and if
> there are none then i flush before blocking waiting for new requests.)

how often to you find yourself flushing the little bits anyhow?

> > As for the header/trailer stuff, you're right, I should have spec'd a
> > separate iovec for each :)
> 
> well, if you've got low system call overhead (such as linux ;), and you
> add TCP_CORK ... then you don't even need to combine all those system
> calls into one monster syscall.

how low is the system call overhead to check for the next request before
you flush?

(i'm not sure that I'd say HP-UX sendfile() was a combination of system
calls - i'd probably say it was a (partial) replacement for writev())

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread dean gaudet

On Wed, 17 Jan 2001, Rick Jones wrote:

> > The fact that I understand _why_ it is done that way doesn't mean that I
> > don't think it's a hack. It doesn't allow you to sendfile multiple files
> > etc without having nagle boundaries, and the header/trailer stuff really
> > isn't a generic solution.
>
> Hmm, I would think that nagle would only come into play if those files
> were each less than MSS and there were no intervening application level
> reply/request messages for each.

actually the problem isn't nagle...  nagle needs to be turned off for
efficient servers anyhow.  but once it's turned off, the standard socket
API requires (or rather allows) the kernel to flush packets to the wire
after each system call.

consider the case where you're responding to a pair of pipelined HTTP/1.1
requests.  with the HPUX and BSD sendfile() APIs you end up forcing a
packet boundary between the two responses.  this is likely to result in
one small packet on the wire after each response.

with the linux TCP_CORK API you only get one trailing small packet.  in
case you haven't heard of TCP_CORK -- when the cork is set, the kernel is
free to send any maximum size packets it can form, but has to hold on to
the stragglers until userland gives it more data or pops the cork.

(the heuristic i use in apache to decide if i need to flush responses in a
pipeline is to look if there are any more requests to read first, and if
there are none then i flush before blocking waiting for new requests.)

> As for the header/trailer stuff, you're right, I should have spec'd a
> separate iovec for each :)

well, if you've got low system call overhead (such as linux ;), and you
add TCP_CORK ... then you don't even need to combine all those system
calls into one monster syscall.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Rick Jones

> The fact that I understand _why_ it is done that way doesn't mean that I
> don't think it's a hack. It doesn't allow you to sendfile multiple files
> etc without having nagle boundaries, and the header/trailer stuff really
> isn't a generic solution.

Hmm, I would think that nagle would only come into play if those files
were each less than MSS and there were no intervening application level
reply/request messages for each. So, perhaps rcp, but not FTP nor HTTP.
I'm not sure where the break-even point versus send() is on other OSes,
but it seems to be in the neighborhood of the typical ethernet MSS on
HP-UX.

As for the header/trailer stuff, you're right, I should have spec'd a
separate iovec for each :)

> Also note how I said that it is the BSD people I _despise_. Not The HP-UX

That misunderstanding would be the result of my entering the
conversation in the middle...

> implementation. The HP-UX one is not pretty, but it works. But I hold open
> source people to higher standards. They are supposed to be the people who
> do programming because it's an art-form, not because it's their job.

I'm not sure, but I think I've just been insulted !-) (in case it is not
clear, that is meant as a joke...)

rick jones
-- 
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]

2001-01-17 Thread Linus Torvalds



Rick Jones <[EMAIL PROTECTED]> wrote:
>
> : >Agreed -- the hard-coded Nagle algorithm makes no sense these days.
> :
> : The fact I dislike about the HP-UX implementation is that it is so
> : _obviously_ stupid.
> :
> : And I have to say that I absolutely despise the BSD people.  They did
> : sendfile() after both Linux and HP-UX had done it, and they must have
> : known about both implementations.  And they chose the HP-UX braindamage,
> : and even brag about the fact that they were stupid and didn't understand
> : TCP_CORK (they don't say so in those exact words, of course - they just
> : show that they were stupid and clueless by the things they brag about).
> :
> : Oh, well. Not everybody can be as goodlooking as me. It's a curse.
> 
> nor it would seem, as humble :)

Yeah.. Humble is my middle name.

> Hello Linus, my name is Rick Jones. I am the person at Hewlett-Packard
> who drafted the "so _obviously_ stupid" sendfile() interface of HP-UX.
> Some of your critique (quoted above) found its way to my inbox and I
> thought I would introduce myself to you to give you an opportunity to
> expand a bit on your criticism. In return, if you like, I would be more
> than happy to describe a bit of the history of sendfile() on HP-UX.
> Perhaps (though I cannot say with any certainty) it will help explain
> why HP-UX sendfile() is spec'd the way it is.

I do realize why sendfile() is specced like it is: if you don't want to
change the networking layer, it's the obvious way to do it. You can take
just generate an iovec internally in the kernel, and pass that on to an
unmodified networking layer.

Hey, that's the way I'd do it too if I didn't have the ear of the
networking people and could tell them that "Psst! THIS is the right way of
doing this".

The fact that I understand _why_ it is done that way doesn't mean that I
don't think it's a hack. It doesn't allow you to sendfile multiple files
etc without having nagle boundaries, and the header/trailer stuff really
isn't a generic solution.

Sendfile() as done in HP-UX is a performance optimization. Fine. But it's
not exactly pretty. It shouldn't be called "sendfile()", it's more of a
called "send_a_file_and_these_headers_and_those_trailers()" system call.

Also note how I said that it is the BSD people I _despise_. Not The HP-UX
implementation. The HP-UX one is not pretty, but it works. But I hold open
source people to higher standards. They are supposed to be the people who
do programming because it's an art-form, not because it's their job. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/