Re: [ovs-dev] [PATCH v4] netlink-socket: Exit NL transaction loop when EINVAL is returned

Nithin Raju Thu, 16 Apr 2015 09:07:47 -0700

Eitan/Sorin,
This is the code on Linux:

        /* Receive a reply. */                              
        error = nl_sock_recv__(sock, buf_txn->reply, false);
        if (error) {                                        
            if (error == EAGAIN) {                          
                nl_sock_record_errors__(transactions, n, 0);
                *done += n;                                 
                error = 0;                                  
            }                                               
            break;                                          
        }


I am reading this as “If there’s an error, break out of the loop since the 
current transaction failed, and it is highly unlikely that future transactions 
in this batch will succeed”. At this point, we are not counting the current 
transaction that failed (with error other than EAGAIN), and hence we are not 
incrementing ‘*done’. I am advocating for the same logic to be followed on 
Windows also, in addition to breaking out of the loop.

There’s special handling for EAGIN which implies that there’s no reply queued 
up in the socket in the kernel. It is not an error as such. So, record EAGAIN 
in the transactions, and return as though the transaction completed and let the 
caller handle EAGAIN.

If you are particular that we should increment ‘*done’ on error, I am fine with 
it. I am working on the patch to make the kernel changes to return success 
during regular errors too. We’ll revisit this at that point if need be, since 
we’ll have the concrete kernel patch and come up with whatever is good 
end-to-end.

Sorin’s patch is good to go in, IMO.

thanks,
-- Nithin


> On Apr 16, 2015, at 7:06 AM, Eitan Eliahu <[email protected]> wrote:
> 
> 
> As I understand the semantics, we need to return an error and bail out only 
> when there a "system level" error which equivalent to a transport error or 
> disconnection in a standard IPC. All transaction based errors should return a 
> success by the I/O control and an NL error message should be  built in the 
> reply buffer.
> Incrementing the *done variable would be closer to what we want to achieve 
> but until we aligned the driver with this policy this change would not be 
> complete. For now, we just want to prevent the state where we enter an 
> infinite loop.
> Thanks,
> Eitan
> 
> -----Original Message-----
> From: Sorin Vinturis [mailto:[email protected]] 
> Sent: Wednesday, April 15, 2015 9:10 PM
> To: Nithin Raju; Eitan Eliahu
> Cc: [email protected]
> Subject: RE: [ovs-dev] [PATCH v4] netlink-socket: Exit NL transaction loop 
> when EINVAL is returned
> 
> Returning error on driver request failure would exit the while loop no matter 
> if the done counter is incremented or not.
> So either we return error and exit immediately the while loop, or increment 
> the done counter and continue to discuss to an unresponsive driver until the 
> transactions are consumed.
> 
> -----Original Message-----
> From: Nithin Raju [mailto:[email protected]] 
> Sent: Thursday, 16 April, 2015 00:22
> To: Eitan Eliahu
> Cc: Sorin Vinturis; [email protected]
> Subject: Re: [ovs-dev] [PATCH v4] netlink-socket: Exit NL transaction loop 
> when EINVAL is returned
> 
>> On Apr 15, 2015, at 1:49 PM, Eitan Eliahu <[email protected]> wrote:
>> 
>> If we remove the increment of the transaction than the whole sequence of 
>> transaction will be lost.
> 
> This is not true. We increment ‘*done’ after each successful transaction:
> 48         /* Count the number of successful transactions. */ 
> 49         (*done)++;                                         
> 
>> This should be done only after we change the driver to never fail in case 
>> the transaction fail. The current driver fails also when there are 
>> transaction level erros and when there are some other "temporary" errors. I 
>> would prefer to move to the next transaction in that case rather than dump 
>> the whole sequence. 
> 
> Yes, the driver needs to be updated to return STATUS_SUCCESS in most cases, 
> unless there are issues with the genetlink header itself. If there’s such an 
> error value returned by the kernel, instead of checking 'error = EINVAL’, we 
> should check for GetLastError(). Otherwise, nl_sock_transact_multiple__() 
> should behave as though no error occurred.
> 
> -- Nithin

_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v4] netlink-socket: Exit NL transaction loop when EINVAL is returned

Reply via email to