Eitan/Sorin,
This is the code on Linux:
/* Receive a reply. */
error = nl_sock_recv__(sock, buf_txn->reply, false);
if (error) {
if (error == EAGAIN) {
nl_sock_record_errors__(transactions, n, 0);
*done += n;
error = 0;
}
break;
}
I am reading this as “If there’s an error, break out of the loop since the
current transaction failed, and it is highly unlikely that future transactions
in this batch will succeed”. At this point, we are not counting the current
transaction that failed (with error other than EAGAIN), and hence we are not
incrementing ‘*done’. I am advocating for the same logic to be followed on
Windows also, in addition to breaking out of the loop.
There’s special handling for EAGIN which implies that there’s no reply queued
up in the socket in the kernel. It is not an error as such. So, record EAGAIN
in the transactions, and return as though the transaction completed and let the
caller handle EAGAIN.
If you are particular that we should increment ‘*done’ on error, I am fine with
it. I am working on the patch to make the kernel changes to return success
during regular errors too. We’ll revisit this at that point if need be, since
we’ll have the concrete kernel patch and come up with whatever is good
end-to-end.
Sorin’s patch is good to go in, IMO.
thanks,
-- Nithin
> On Apr 16, 2015, at 7:06 AM, Eitan Eliahu <[email protected]> wrote:
>
>
> As I understand the semantics, we need to return an error and bail out only
> when there a "system level" error which equivalent to a transport error or
> disconnection in a standard IPC. All transaction based errors should return a
> success by the I/O control and an NL error message should be built in the
> reply buffer.
> Incrementing the *done variable would be closer to what we want to achieve
> but until we aligned the driver with this policy this change would not be
> complete. For now, we just want to prevent the state where we enter an
> infinite loop.
> Thanks,
> Eitan
>
> -----Original Message-----
> From: Sorin Vinturis [mailto:[email protected]]
> Sent: Wednesday, April 15, 2015 9:10 PM
> To: Nithin Raju; Eitan Eliahu
> Cc: [email protected]
> Subject: RE: [ovs-dev] [PATCH v4] netlink-socket: Exit NL transaction loop
> when EINVAL is returned
>
> Returning error on driver request failure would exit the while loop no matter
> if the done counter is incremented or not.
> So either we return error and exit immediately the while loop, or increment
> the done counter and continue to discuss to an unresponsive driver until the
> transactions are consumed.
>
> -----Original Message-----
> From: Nithin Raju [mailto:[email protected]]
> Sent: Thursday, 16 April, 2015 00:22
> To: Eitan Eliahu
> Cc: Sorin Vinturis; [email protected]
> Subject: Re: [ovs-dev] [PATCH v4] netlink-socket: Exit NL transaction loop
> when EINVAL is returned
>
>> On Apr 15, 2015, at 1:49 PM, Eitan Eliahu <[email protected]> wrote:
>>
>> If we remove the increment of the transaction than the whole sequence of
>> transaction will be lost.
>
> This is not true. We increment ‘*done’ after each successful transaction:
> 48 /* Count the number of successful transactions. */
> 49 (*done)++;
>
>> This should be done only after we change the driver to never fail in case
>> the transaction fail. The current driver fails also when there are
>> transaction level erros and when there are some other "temporary" errors. I
>> would prefer to move to the next transaction in that case rather than dump
>> the whole sequence.
>
> Yes, the driver needs to be updated to return STATUS_SUCCESS in most cases,
> unless there are issues with the genetlink header itself. If there’s such an
> error value returned by the kernel, instead of checking 'error = EINVAL’, we
> should check for GetLastError(). Otherwise, nl_sock_transact_multiple__()
> should behave as though no error occurred.
>
> -- Nithin
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev