Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Thu, 20 Sep 2007, Nagendra Tomar wrote: > > That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the > > ability to write, and it is not meant as to signal every time a packet > > (skb) is sent on the wire (and the buffer released). > > Aren't they both the same ? Everytime an incoming ACK frees up a buffer > from the retransmit queue, the writability condition is freshly asserted, > much the same way as the readability condition is asserted everytime a > new data is queued in the socket receive queue (irrespective of > whether there was data already waiting to be read in the receive queue). > > This difference in meaning of POLLOUT only arises in the ET case, which was > not what traditional Unix poll referred to. Again, events here are "readability" and "writeability" and the fact that the lower network layer freed a buffer is not, per se, an event (unless before, "writeability" was tested and reported as unavailable). In you case the solution looks pretty simple. Just create appropriately sized buffers, split the single sendfile into multiple buffer-sized ones, and recycle the buffer once each of them completes. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Thu, 20 Sep 2007, Nagendra Tomar wrote: That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the ability to write, and it is not meant as to signal every time a packet (skb) is sent on the wire (and the buffer released). Aren't they both the same ? Everytime an incoming ACK frees up a buffer from the retransmit queue, the writability condition is freshly asserted, much the same way as the readability condition is asserted everytime a new data is queued in the socket receive queue (irrespective of whether there was data already waiting to be read in the receive queue). This difference in meaning of POLLOUT only arises in the ET case, which was not what traditional Unix poll referred to. Again, events here are readability and writeability and the fact that the lower network layer freed a buffer is not, per se, an event (unless before, writeability was tested and reported as unavailable). In you case the solution looks pretty simple. Just create appropriately sized buffers, split the single sendfile into multiple buffer-sized ones, and recycle the buffer once each of them completes. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi <[EMAIL PROTECTED]> wrote: > That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the > ability to write, and it is not meant as to signal every time a packet > (skb) is sent on the wire (and the buffer released). Aren't they both the same ? Everytime an incoming ACK frees up a buffer from the retransmit queue, the writability condition is freshly asserted, much the same way as the readability condition is asserted everytime a new data is queued in the socket receive queue (irrespective of whether there was data already waiting to be read in the receive queue). This difference in meaning of POLLOUT only arises in the ET case, which was not what traditional Unix poll referred to. Since its a new game the rules can be modified (ofcourse based on the merits i.e. usability) Thanx, Tomar ___ Want ideas for reducing your carbon footprint? Visit Yahoo! For Good http://uk.promotions.yahoo.com/forgood/environment.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Thu, 20 Sep 2007, Nagendra Tomar wrote: > > --- Davide Libenzi <[EMAIL PROTECTED]> wrote: > > > Looking back at it, I think the current TCP code is right, once you look > > at the "event" to be a output buffer full->with_space transition. > > If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event > > (free space on the output buffer), if you do not consume it (say a > > tcp_sendmsg that re-fill the buffer), you can't see other OUT event > > anymore since they happen on the full->with_space transition. > > Yes, I know, the read size (EPOLLIN) works differently and you get an > > event for every packet you receive. And yes, I do not like asymmetric > > things. But that does not make the EPOLLOUT|EPOLLET wrong IMO. > > > > I agree that ET means the event should happen at the transition > from nospace->space condition, but isn't the other case (event is > delivered every time the event actually happens) more usable. > Also the epoll man page says so > > "... Edge Triggered event distribution delivers events only when > events happens on the monitored file." > > This serves the purpose of ET (reducing the number of poll events) and > at the same time makes userspace coding easier. My userspace code > has the liberty of deciding when it can write to the socket. f.e. the > sendfile buffer management example that I quoted in my earlier post > will be difficult with the current ET|POLLOUT behaviour. I cannot > write in full-buffer units. I'll ve to write partial buffers just to > fill the TCP writeq which is needed to trigger the event. That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the ability to write, and it is not meant as to signal every time a packet (skb) is sent on the wire (and the buffer released). In your particular application, you could simply split the sendfile into appropriately sized chunks, and handle the buffer realease in there. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi <[EMAIL PROTECTED]> wrote: > > Unfortunately f_op->poll() does not let the caller to specify the events > it's interested in, that would allow to split send/recevie wait queues and > better detect read/write cases. > The detection of a waitqueue_active(->sk_wr_sleep) would work fine in > detecting is someone is actually waiting for a write, w/out the false > positives triggered by the read-waiters. > That would be a very sane thing to do, but would require a big change > to all the ->poll around (that could be automated by a script - devices > not caring about the events hint can just continue to use the single queue > like they currently do), and a more critical and gradual change of all the > devices that wants to take advantage of it. > That way, no more magic bits are needed, and a simple waitqueue_active() > would tell you if someone is waiting for write-space events. > I like this. ___ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi <[EMAIL PROTECTED]> wrote: > Looking back at it, I think the current TCP code is right, once you look > at the "event" to be a output buffer full->with_space transition. > If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event > (free space on the output buffer), if you do not consume it (say a > tcp_sendmsg that re-fill the buffer), you can't see other OUT event > anymore since they happen on the full->with_space transition. > Yes, I know, the read size (EPOLLIN) works differently and you get an > event for every packet you receive. And yes, I do not like asymmetric > things. But that does not make the EPOLLOUT|EPOLLET wrong IMO. > I agree that ET means the event should happen at the transition from nospace->space condition, but isn't the other case (event is delivered every time the event actually happens) more usable. Also the epoll man page says so "... Edge Triggered event distribution delivers events only when events happens on the monitored file." This serves the purpose of ET (reducing the number of poll events) and at the same time makes userspace coding easier. My userspace code has the liberty of deciding when it can write to the socket. f.e. the sendfile buffer management example that I quoted in my earlier post will be difficult with the current ET|POLLOUT behaviour. I cannot write in full-buffer units. I'll ve to write partial buffers just to fill the TCP writeq which is needed to trigger the event. Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Thu, 20 Sep 2007, Eric Dumazet wrote: > Does it means that with your patch each ACK on a ET managed socket will > trigger an epoll event ? > > Maybe your very sensitive high throuput appication needs to set a flag or > something at socket level to ask for such a behavior. > > The default should stay as is. That is an event should be sent only if someone > cared about the wakeup. Unfortunately f_op->poll() does not let the caller to specify the events it's interested in, that would allow to split send/recevie wait queues and better detect read/write cases. The detection of a waitqueue_active(->sk_wr_sleep) would work fine in detecting is someone is actually waiting for a write, w/out the false positives triggered by the read-waiters. That would be a very sane thing to do, but would require a big change to all the ->poll around (that could be automated by a script - devices not caring about the events hint can just continue to use the single queue like they currently do), and a more critical and gradual change of all the devices that wants to take advantage of it. That way, no more magic bits are needed, and a simple waitqueue_active() would tell you if someone is waiting for write-space events. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Wed, 19 Sep 2007, Nagendra Tomar wrote: > The tcp_check_space() function calls tcp_new_space() only if the > SOCK_NOSPACE bit is set in the socket flags. This is causing Edge Triggered > EPOLLOUT events to be missed for TCP sockets, as the ep_poll_callback() > is not called from the wakeup routine. > > The SOCK_NOSPACE bit indicates the user's intent to perform writes > on that socket (set in tcp_sendmsg and tcp_poll). I believe the idea > behind the SOCK_NOSPACE check is to optimize away the tcp_new_space call > in cases when user is not interested in writing to the socket. These two > take care of all possible scenarios in which a user can convey his intent > to write on that socket. > > Case 1: tcp_sendmsg detects lack of sndbuf space > Case 2: tcp_poll returns not writable > > This is fine if we do not deal with epoll's Edge Triggered events (EPOLLET). > With ET events we can have a scenario where the SOCK_NOSPACE bit is not set, > as the user has neither done a sendmsg nor a poll/epoll call that returned > with the POLLOUT condition not set. Looking back at it, I think the current TCP code is right, once you look at the "event" to be a output buffer full->with_space transition. If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event (free space on the output buffer), if you do not consume it (say a tcp_sendmsg that re-fill the buffer), you can't see other OUT event anymore since they happen on the full->with_space transition. Yes, I know, the read size (EPOLLIN) works differently and you get an event for every packet you receive. And yes, I do not like asymmetric things. But that does not make the EPOLLOUT|EPOLLET wrong IMO. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Eric Dumazet <[EMAIL PROTECTED]> wrote: > Nagendra Tomar a écrit : > > --- Davide Libenzi <[EMAIL PROTECTED]> wrote: > > > >> On Wed, 19 Sep 2007, David Miller wrote: > >> > >>> From: Nagendra Tomar <[EMAIL PROTECTED]> > >>> Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) > >>> > With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call > will > not return, even when the incoming acks free the buffers. > Note that this patch assumes that the SOCK_NOSPACE check in > tcp_check_space is a trivial optimization which can be safely removed. > >>> I already replied to your patch posting explaining that whatever is > >>> not setting SOCK_NOSPACE should be fixed instead. > >>> > >>> Please address that, thanks. > >> You're not planning of putting the notion of a SOCK_NOSPACE bit inside a > >> completely device-unaware interface like epoll, I hope? > >> > > > > Definitely not ! > > > > The point is that the "tcp write space available" > > wakeup does not get called if SOCK_NOSPACE bit is not set. This was > > fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit > > indicated that someone really cared abt the wakeup). Now after the > > introduction of callback'ed wakeups, we might have some work to > > do inside the callback even if there is nobody interested in the wakeup > > at that point of time. > > > > In this particular case the ep_poll_callback is not getting called and > > hence the socket fd is not getting added to the ready list. > > > > Does it means that with your patch each ACK on a ET managed socket will > trigger an epoll event ? > > Maybe your very sensitive high throuput appication needs to set a flag or > something at socket level to ask for such a behavior. > > The default should stay as is. That is an event should be sent only if > someone > cared about the wakeup. > A high throughput app will always care about the wakeup, or else it will not be a high throughput app in the first place. An application that occasionaly writes and then goes to slumber and then writes again will not be a high throughput app. My point is that the SOCK_NOSPACE check does not save us much. For high throughput app it will almost always be set, thus making the check insignificant, and for the low throughput case we care less. Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
Nagendra Tomar a écrit : --- Davide Libenzi <[EMAIL PROTECTED]> wrote: On Wed, 19 Sep 2007, David Miller wrote: From: Nagendra Tomar <[EMAIL PROTECTED]> Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? Definitely not ! The point is that the "tcp write space available" wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. Does it means that with your patch each ACK on a ET managed socket will trigger an epoll event ? Maybe your very sensitive high throuput appication needs to set a flag or something at socket level to ask for such a behavior. The default should stay as is. That is an event should be sent only if someone cared about the wakeup. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
Nagendra Tomar a écrit : --- Davide Libenzi [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, David Miller wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? Definitely not ! The point is that the tcp write space available wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. Does it means that with your patch each ACK on a ET managed socket will trigger an epoll event ? Maybe your very sensitive high throuput appication needs to set a flag or something at socket level to ask for such a behavior. The default should stay as is. That is an event should be sent only if someone cared about the wakeup. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Eric Dumazet [EMAIL PROTECTED] wrote: Nagendra Tomar a écrit : --- Davide Libenzi [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, David Miller wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? Definitely not ! The point is that the tcp write space available wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. Does it means that with your patch each ACK on a ET managed socket will trigger an epoll event ? Maybe your very sensitive high throuput appication needs to set a flag or something at socket level to ask for such a behavior. The default should stay as is. That is an event should be sent only if someone cared about the wakeup. A high throughput app will always care about the wakeup, or else it will not be a high throughput app in the first place. An application that occasionaly writes and then goes to slumber and then writes again will not be a high throughput app. My point is that the SOCK_NOSPACE check does not save us much. For high throughput app it will almost always be set, thus making the check insignificant, and for the low throughput case we care less. Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Wed, 19 Sep 2007, Nagendra Tomar wrote: The tcp_check_space() function calls tcp_new_space() only if the SOCK_NOSPACE bit is set in the socket flags. This is causing Edge Triggered EPOLLOUT events to be missed for TCP sockets, as the ep_poll_callback() is not called from the wakeup routine. The SOCK_NOSPACE bit indicates the user's intent to perform writes on that socket (set in tcp_sendmsg and tcp_poll). I believe the idea behind the SOCK_NOSPACE check is to optimize away the tcp_new_space call in cases when user is not interested in writing to the socket. These two take care of all possible scenarios in which a user can convey his intent to write on that socket. Case 1: tcp_sendmsg detects lack of sndbuf space Case 2: tcp_poll returns not writable This is fine if we do not deal with epoll's Edge Triggered events (EPOLLET). With ET events we can have a scenario where the SOCK_NOSPACE bit is not set, as the user has neither done a sendmsg nor a poll/epoll call that returned with the POLLOUT condition not set. Looking back at it, I think the current TCP code is right, once you look at the event to be a output buffer full-with_space transition. If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event (free space on the output buffer), if you do not consume it (say a tcp_sendmsg that re-fill the buffer), you can't see other OUT event anymore since they happen on the full-with_space transition. Yes, I know, the read size (EPOLLIN) works differently and you get an event for every packet you receive. And yes, I do not like asymmetric things. But that does not make the EPOLLOUT|EPOLLET wrong IMO. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Thu, 20 Sep 2007, Eric Dumazet wrote: Does it means that with your patch each ACK on a ET managed socket will trigger an epoll event ? Maybe your very sensitive high throuput appication needs to set a flag or something at socket level to ask for such a behavior. The default should stay as is. That is an event should be sent only if someone cared about the wakeup. Unfortunately f_op-poll() does not let the caller to specify the events it's interested in, that would allow to split send/recevie wait queues and better detect read/write cases. The detection of a waitqueue_active(-sk_wr_sleep) would work fine in detecting is someone is actually waiting for a write, w/out the false positives triggered by the read-waiters. That would be a very sane thing to do, but would require a bigdumb change to all the -poll around (that could be automated by a script - devices not caring about the events hint can just continue to use the single queue like they currently do), and a more critical and gradual change of all the devices that wants to take advantage of it. That way, no more magic bits are needed, and a simple waitqueue_active() would tell you if someone is waiting for write-space events. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi [EMAIL PROTECTED] wrote: Looking back at it, I think the current TCP code is right, once you look at the event to be a output buffer full-with_space transition. If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event (free space on the output buffer), if you do not consume it (say a tcp_sendmsg that re-fill the buffer), you can't see other OUT event anymore since they happen on the full-with_space transition. Yes, I know, the read size (EPOLLIN) works differently and you get an event for every packet you receive. And yes, I do not like asymmetric things. But that does not make the EPOLLOUT|EPOLLET wrong IMO. I agree that ET means the event should happen at the transition from nospace-space condition, but isn't the other case (event is delivered every time the event actually happens) more usable. Also the epoll man page says so ... Edge Triggered event distribution delivers events only when events happens on the monitored file. This serves the purpose of ET (reducing the number of poll events) and at the same time makes userspace coding easier. My userspace code has the liberty of deciding when it can write to the socket. f.e. the sendfile buffer management example that I quoted in my earlier post will be difficult with the current ET|POLLOUT behaviour. I cannot write in full-buffer units. I'll ve to write partial buffers just to fill the TCP writeq which is needed to trigger the event. Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi [EMAIL PROTECTED] wrote: Unfortunately f_op-poll() does not let the caller to specify the events it's interested in, that would allow to split send/recevie wait queues and better detect read/write cases. The detection of a waitqueue_active(-sk_wr_sleep) would work fine in detecting is someone is actually waiting for a write, w/out the false positives triggered by the read-waiters. That would be a very sane thing to do, but would require a bigdumb change to all the -poll around (that could be automated by a script - devices not caring about the events hint can just continue to use the single queue like they currently do), and a more critical and gradual change of all the devices that wants to take advantage of it. That way, no more magic bits are needed, and a simple waitqueue_active() would tell you if someone is waiting for write-space events. I like this. ___ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Thu, 20 Sep 2007, Nagendra Tomar wrote: --- Davide Libenzi [EMAIL PROTECTED] wrote: Looking back at it, I think the current TCP code is right, once you look at the event to be a output buffer full-with_space transition. If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event (free space on the output buffer), if you do not consume it (say a tcp_sendmsg that re-fill the buffer), you can't see other OUT event anymore since they happen on the full-with_space transition. Yes, I know, the read size (EPOLLIN) works differently and you get an event for every packet you receive. And yes, I do not like asymmetric things. But that does not make the EPOLLOUT|EPOLLET wrong IMO. I agree that ET means the event should happen at the transition from nospace-space condition, but isn't the other case (event is delivered every time the event actually happens) more usable. Also the epoll man page says so ... Edge Triggered event distribution delivers events only when events happens on the monitored file. This serves the purpose of ET (reducing the number of poll events) and at the same time makes userspace coding easier. My userspace code has the liberty of deciding when it can write to the socket. f.e. the sendfile buffer management example that I quoted in my earlier post will be difficult with the current ET|POLLOUT behaviour. I cannot write in full-buffer units. I'll ve to write partial buffers just to fill the TCP writeq which is needed to trigger the event. That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the ability to write, and it is not meant as to signal every time a packet (skb) is sent on the wire (and the buffer released). In your particular application, you could simply split the sendfile into appropriately sized chunks, and handle the buffer realease in there. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi [EMAIL PROTECTED] wrote: That's not what POLLOUT means in the Unix meaning. POLLOUT indicates the ability to write, and it is not meant as to signal every time a packet (skb) is sent on the wire (and the buffer released). Aren't they both the same ? Everytime an incoming ACK frees up a buffer from the retransmit queue, the writability condition is freshly asserted, much the same way as the readability condition is asserted everytime a new data is queued in the socket receive queue (irrespective of whether there was data already waiting to be read in the receive queue). This difference in meaning of POLLOUT only arises in the ET case, which was not what traditional Unix poll referred to. Since its a new game the rules can be modified (ofcourse based on the merits i.e. usability) Thanx, Tomar ___ Want ideas for reducing your carbon footprint? Visit Yahoo! For Good http://uk.promotions.yahoo.com/forgood/environment.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Wed, 19 Sep 2007, Nagendra Tomar wrote: > Definitely not ! > > The point is that the "tcp write space available" > wakeup does not get called if SOCK_NOSPACE bit is not set. This was > fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit > indicated that someone really cared abt the wakeup). Now after the > introduction of callback'ed wakeups, we might have some work to > do inside the callback even if there is nobody interested in the wakeup > at that point of time. > > In this particular case the ep_poll_callback is not getting called and > hence the socket fd is not getting added to the ready list. I know, I saw the patch. I was just commenting the point where DaveM was heading to ;) This things needs to be looked at a little bit more deeply. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi <[EMAIL PROTECTED]> wrote: > On Wed, 19 Sep 2007, David Miller wrote: > > > From: Nagendra Tomar <[EMAIL PROTECTED]> > > Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) > > > > > With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call > > > will > > > not return, even when the incoming acks free the buffers. > > > Note that this patch assumes that the SOCK_NOSPACE check in > > > tcp_check_space is a trivial optimization which can be safely removed. > > > > I already replied to your patch posting explaining that whatever is > > not setting SOCK_NOSPACE should be fixed instead. > > > > Please address that, thanks. > > You're not planning of putting the notion of a SOCK_NOSPACE bit inside a > completely device-unaware interface like epoll, I hope? > Definitely not ! The point is that the "tcp write space available" wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- David Miller <[EMAIL PROTECTED]> wrote: > From: Nagendra Tomar <[EMAIL PROTECTED]> > Date: Wed, 19 Sep 2007 15:55:58 -0700 (PDT) > > > I agree that setting SOCK_NOSPACE would have been a more elegant > > fix. Infact I thought a lot about that before deciding on this fix. > > I guess this means you also noticed that you are removing > the one and only test of this bit too? > > You can't remove this, it's critical for performance. I'm sure you would have seen value in the check that's why the check is there. Now we have two critical points to discuss 1. How can we achieve the ET EPOLLOUT event with the SOCK_NOSPACE check in place ? 2. How much effect will removing the check have (if we cannot find a way to get the ET EPOLLOUT notification w/ the check in place) ? Regding (2), IMHO for a "fast sender" the SOCK_NOSPACE check will almost always pass as the sender will come back to write (or poll) before the prev data is drained out. If he doesn't do that, he is not a "fast sender" by definition". A "fast sender" should always have some data to send when he practically (per the sndbuf space) can. For a "slow sender", do we really care abt the optimization ? Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Wed, 19 Sep 2007, David Miller wrote: > From: Nagendra Tomar <[EMAIL PROTECTED]> > Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) > > > With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will > > not return, even when the incoming acks free the buffers. > > Note that this patch assumes that the SOCK_NOSPACE check in > > tcp_check_space is a trivial optimization which can be safely removed. > > I already replied to your patch posting explaining that whatever is > not setting SOCK_NOSPACE should be fixed instead. > > Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
From: Nagendra Tomar <[EMAIL PROTECTED]> Date: Wed, 19 Sep 2007 15:55:58 -0700 (PDT) > I agree that setting SOCK_NOSPACE would have been a more elegant > fix. Infact I thought a lot about that before deciding on this fix. I guess this means you also noticed that you are removing the one and only test of this bit too? You can't remove this, it's critical for performance. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- David Miller <[EMAIL PROTECTED]> wrote: > From: Nagendra Tomar <[EMAIL PROTECTED]> > Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) > > > With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will > > not return, even when the incoming acks free the buffers. > > Note that this patch assumes that the SOCK_NOSPACE check in > > tcp_check_space is a trivial optimization which can be safely removed. > > I already replied to your patch posting explaining that whatever is > not setting SOCK_NOSPACE should be fixed instead. > > Please address that, thanks. Dave, I agree that setting SOCK_NOSPACE would have been a more elegant fix. Infact I thought a lot about that before deciding on this fix. But the point here is that the SOCK_NOSPACE bit can be set when the sndbuf space is really less (less than sk_stream_min_wspace()) and some user action (sendmsg or poll) indicated his intent to write. In the case mentioned none of these is true. Since user wants to manage his tranmit buffers himself, his definition of less may not match with what kernel feels is less. f.e. user might have dynamically changing mmap'ed buffer resources at his disposal which he wants to use as sendfile buffers. He wants to be notified whenever a new incoming ack frees up one or more of his buffers, so that he can reuse that buffer. The bigger problem is that user is not indicating his intent to write, to the kernel. He is just watching the sendbuf space and when it matches his needs he will send new data. Thanx, Tomar ___ Want ideas for reducing your carbon footprint? Visit Yahoo! For Good http://uk.promotions.yahoo.com/forgood/environment.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
From: Nagendra Tomar <[EMAIL PROTECTED]> Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) > With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will > not return, even when the incoming acks free the buffers. > Note that this patch assumes that the SOCK_NOSPACE check in > tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
The tcp_check_space() function calls tcp_new_space() only if the SOCK_NOSPACE bit is set in the socket flags. This is causing Edge Triggered EPOLLOUT events to be missed for TCP sockets, as the ep_poll_callback() is not called from the wakeup routine. The SOCK_NOSPACE bit indicates the user's intent to perform writes on that socket (set in tcp_sendmsg and tcp_poll). I believe the idea behind the SOCK_NOSPACE check is to optimize away the tcp_new_space call in cases when user is not interested in writing to the socket. These two take care of all possible scenarios in which a user can convey his intent to write on that socket. Case 1: tcp_sendmsg detects lack of sndbuf space Case 2: tcp_poll returns not writable This is fine if we do not deal with epoll's Edge Triggered events (EPOLLET). With ET events we can have a scenario where the SOCK_NOSPACE bit is not set, as the user has neither done a sendmsg nor a poll/epoll call that returned with the POLLOUT condition not set. In this case the user will _never_ get an ET POLLOUT event since tcp_check_space() will not call tcp_new_space() (as the SOCK_NOSPACE bit is not set), which does the real work. THIS IS AGAINST THE EPOLL ET PROMISE OF DELIVERING AN EVENT WHENEVER THE EVENT ACTUALLY HAPPENS. This ET event will be very helpful to implement user level memory management for mmap+sendfile zero copy Tx. So typically the application does this void *alloc_sendfile_buf(void) { while(!next_free_buffer) { /* * No free buffers (all are dispatched to sendfile and are * in use). Wait for one or more buffers to become free * The socket fd is registered with EPOLLET|EPOLLOUT events. * EPOLLET enables us to check for SIOCOUTQ only when some * more space becomes available. * * One would expect the ET EPOLLOUT event to be notified * when TCP space is freed due to some ack coming in. */ epoll_wait(...); /* wait for some incoming ack to free some buffer from the retransmit queue */ ioctl(fd, SIOCOUTQ, _outq); /* * see if we can mark some more "complete" buffers free * If it can mark one or more buffer free, it will set * next_free_buffer to point to the available buffer to use */ rehash_free_buffers(in_outq); } return next_free_buffer; } With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. Thanx, Tomar Signed-off-by: Nagendra Singh Tomar <[EMAIL PROTECTED]> --- --- linux-2.6.23-rc6/net/ipv4/tcp_input.c.orig 2007-09-19 13:58:44.0 +0530 +++ linux-2.6.23-rc6/net/ipv4/tcp_input.c 2007-09-19 10:17:36.0 +0530 @@ -3929,8 +3929,7 @@ static void tcp_check_space(struct sock { if (sock_flag(sk, SOCK_QUEUE_SHRUNK)) { sock_reset_flag(sk, SOCK_QUEUE_SHRUNK); - if (sk->sk_socket && - test_bit(SOCK_NOSPACE, >sk_socket->flags)) + if (sk->sk_socket) tcp_new_space(sk); } } ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
The tcp_check_space() function calls tcp_new_space() only if the SOCK_NOSPACE bit is set in the socket flags. This is causing Edge Triggered EPOLLOUT events to be missed for TCP sockets, as the ep_poll_callback() is not called from the wakeup routine. The SOCK_NOSPACE bit indicates the user's intent to perform writes on that socket (set in tcp_sendmsg and tcp_poll). I believe the idea behind the SOCK_NOSPACE check is to optimize away the tcp_new_space call in cases when user is not interested in writing to the socket. These two take care of all possible scenarios in which a user can convey his intent to write on that socket. Case 1: tcp_sendmsg detects lack of sndbuf space Case 2: tcp_poll returns not writable This is fine if we do not deal with epoll's Edge Triggered events (EPOLLET). With ET events we can have a scenario where the SOCK_NOSPACE bit is not set, as the user has neither done a sendmsg nor a poll/epoll call that returned with the POLLOUT condition not set. In this case the user will _never_ get an ET POLLOUT event since tcp_check_space() will not call tcp_new_space() (as the SOCK_NOSPACE bit is not set), which does the real work. THIS IS AGAINST THE EPOLL ET PROMISE OF DELIVERING AN EVENT WHENEVER THE EVENT ACTUALLY HAPPENS. This ET event will be very helpful to implement user level memory management for mmap+sendfile zero copy Tx. So typically the application does this void *alloc_sendfile_buf(void) { while(!next_free_buffer) { /* * No free buffers (all are dispatched to sendfile and are * in use). Wait for one or more buffers to become free * The socket fd is registered with EPOLLET|EPOLLOUT events. * EPOLLET enables us to check for SIOCOUTQ only when some * more space becomes available. * * One would expect the ET EPOLLOUT event to be notified * when TCP space is freed due to some ack coming in. */ epoll_wait(...); /* wait for some incoming ack to free some buffer from the retransmit queue */ ioctl(fd, SIOCOUTQ, in_outq); /* * see if we can mark some more complete buffers free * If it can mark one or more buffer free, it will set * next_free_buffer to point to the available buffer to use */ rehash_free_buffers(in_outq); } return next_free_buffer; } With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. Thanx, Tomar Signed-off-by: Nagendra Singh Tomar [EMAIL PROTECTED] --- --- linux-2.6.23-rc6/net/ipv4/tcp_input.c.orig 2007-09-19 13:58:44.0 +0530 +++ linux-2.6.23-rc6/net/ipv4/tcp_input.c 2007-09-19 10:17:36.0 +0530 @@ -3929,8 +3929,7 @@ static void tcp_check_space(struct sock { if (sock_flag(sk, SOCK_QUEUE_SHRUNK)) { sock_reset_flag(sk, SOCK_QUEUE_SHRUNK); - if (sk-sk_socket - test_bit(SOCK_NOSPACE, sk-sk_socket-flags)) + if (sk-sk_socket) tcp_new_space(sk); } } ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- David Miller [EMAIL PROTECTED] wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. Dave, I agree that setting SOCK_NOSPACE would have been a more elegant fix. Infact I thought a lot about that before deciding on this fix. But the point here is that the SOCK_NOSPACE bit can be set when the sndbuf space is really less (less than sk_stream_min_wspace()) and some user action (sendmsg or poll) indicated his intent to write. In the case mentioned none of these is true. Since user wants to manage his tranmit buffers himself, his definition of less may not match with what kernel feels is less. f.e. user might have dynamically changing mmap'ed buffer resources at his disposal which he wants to use as sendfile buffers. He wants to be notified whenever a new incoming ack frees up one or more of his buffers, so that he can reuse that buffer. The bigger problem is that user is not indicating his intent to write, to the kernel. He is just watching the sendbuf space and when it matches his needs he will send new data. Thanx, Tomar ___ Want ideas for reducing your carbon footprint? Visit Yahoo! For Good http://uk.promotions.yahoo.com/forgood/environment.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:55:58 -0700 (PDT) I agree that setting SOCK_NOSPACE would have been a more elegant fix. Infact I thought a lot about that before deciding on this fix. I guess this means you also noticed that you are removing the one and only test of this bit too? You can't remove this, it's critical for performance. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Wed, 19 Sep 2007, David Miller wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- David Miller [EMAIL PROTECTED] wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:55:58 -0700 (PDT) I agree that setting SOCK_NOSPACE would have been a more elegant fix. Infact I thought a lot about that before deciding on this fix. I guess this means you also noticed that you are removing the one and only test of this bit too? You can't remove this, it's critical for performance. I'm sure you would have seen value in the check that's why the check is there. Now we have two critical points to discuss 1. How can we achieve the ET EPOLLOUT event with the SOCK_NOSPACE check in place ? 2. How much effect will removing the check have (if we cannot find a way to get the ET EPOLLOUT notification w/ the check in place) ? Regding (2), IMHO for a fast sender the SOCK_NOSPACE check will almost always pass as the sender will come back to write (or poll) before the prev data is drained out. If he doesn't do that, he is not a fast sender by definition. A fast sender should always have some data to send when he practically (per the sndbuf space) can. For a slow sender, do we really care abt the optimization ? Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
--- Davide Libenzi [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007, David Miller wrote: From: Nagendra Tomar [EMAIL PROTECTED] Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT) With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will not return, even when the incoming acks free the buffers. Note that this patch assumes that the SOCK_NOSPACE check in tcp_check_space is a trivial optimization which can be safely removed. I already replied to your patch posting explaining that whatever is not setting SOCK_NOSPACE should be fixed instead. Please address that, thanks. You're not planning of putting the notion of a SOCK_NOSPACE bit inside a completely device-unaware interface like epoll, I hope? Definitely not ! The point is that the tcp write space available wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. Thanx, Tomar ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets
On Wed, 19 Sep 2007, Nagendra Tomar wrote: Definitely not ! The point is that the tcp write space available wakeup does not get called if SOCK_NOSPACE bit is not set. This was fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit indicated that someone really cared abt the wakeup). Now after the introduction of callback'ed wakeups, we might have some work to do inside the callback even if there is nobody interested in the wakeup at that point of time. In this particular case the ep_poll_callback is not getting called and hence the socket fd is not getting added to the ready list. I know, I saw the patch. I was just commenting the point where DaveM was heading to ;) This things needs to be looked at a little bit more deeply. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/