On Tue, Aug 16, 2016 at 03:33:30PM -0300, Marcelo Ricardo Leitner wrote: > On Wed, Aug 17, 2016 at 02:24:19AM +0800, Xin Long wrote: > > >> > This err returns back to sctp_sendmsg, there sctp will abort asoc. > > > > > > That's not right I think. sctp_sendmsg will only free the asoc if it was > > > created to send that specific chunk. And in this case, this change > > > should have no effect as it can't have sctp_outq_flush() touching > > > several transports in a row. > > > > > > I'm basing on: > > > out_free: > > > if (new_asoc) > > > sctp_association_free(asoc); > > > > > > and sctp_recvmsg will just fetch, return and clear the error via > > > sctp_skb_recv_datagram, but not free it. > > > > > > Do you see any other place freeing it? > > Sorry, you are right, it free assoc just for new_asoc. > > > > > > > >> > > >> That doesn't seem a good idea. > > >> You don't want to abort the association if there is a transient > > >> memory allocation failure. > > >> You also can't drop data chunks. > > > > > > From a system-wise POV, this behavior - to free the new asoc in case of > > > transient memory allocation failure - doesn't seem bad to me. > > > That's what will have to happen if any allocation before it failed and > > > also it helps the system to reduce the stress a little bit. I don't see > > > any inconsistency/problems here because we are not dropping a single > > > random chunk but instead we are actually refusing to initialize a new > > > asoc in such conditions. > > > > > > Nevertheless, I agree that letting the application see ENOMEM errors when > > > the data actually got queued and is being fully handled, as in, it will > > > be retransmitted later, is not be wise, as the application probably > > > won't be able to distinguish from ENOMEMs that it should retry or not. > > > Here I see a problem, yet it's not due to this specific change, perhaps > > > it just got attention because of it. In this situation, we should handle > > > ENOMEMs internally if possible so the application can know that if it > > > hits an ENOMEM, it's real and it has to retry. > > If letting the application see ENOMEM errors, and sctp has to drop this > > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM > > chunk may not be the chunk from current msg, as it flush all the queue. > > even if users get an ENOMEM error, they may re-send a chunk that is same > > with the one that is still in retransmit queue. > > Yep, one more reason to handle those internally when safe.
Xin, maybe you can squash this patch and this ENOMEM handling? I'm thinking that handling ENOMEM may result in similar situations in other places, so we have a common reasoning on them.