On 3/22/21 7:44 PM, Numan Siddique wrote:
> On Fri, Mar 19, 2021 at 8:10 PM Dumitru Ceara <[email protected]> wrote:
>>
>> On 3/19/21 1:05 PM, [email protected] wrote:
>>> From: Numan Siddique <[email protected]>
>>>
>>> When a port binding type changes from type 'A' to type 'B', then
>>> there are many code paths in the existing binding.c which results
>>> in crashes due to use-after-free or NULL references.
>>>
>>> Below crashes are seen when a container lport is changed to a normal
>>> lport and then deleted.
>>>
>>> ***
>>> (gdb) bt
>>> 0 in raise () from /lib64/libc.so.6
>>> 1 in abort () from /lib64/libc.so.6
>>> 2 ovs_abort_valist ("%s: assertion %s failed in %s()") at lib/util.c:419
>>> 3 vlog_abort_valist ("%s: assertion %s failed in %s()") at
>>> lib/vlog.c:1249
>>> 4 vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263
>>> 5 ovs_assert_failure (where="lib/ovsdb-idl.c:4653",
>>> function="ovsdb_idl_txn_write__",
>>> condition="row->new_datum != NULL") at
>>> lib/util.c:86
>>> 6 ovsdb_idl_txn_write__ () at lib/ovsdb-idl.c:4695
>>> 7 ovsdb_idl_txn_write_clone () at lib/ovsdb-idl.c:4757
>>> 8 sbrec_port_binding_set_chassis () at lib/ovn-sb-idl.c:25946
>>> 9 release_lport () at controller/binding.c:971
>>> 10 release_local_binding_children () at controller/binding.c:1039
>>> 11 release_local_binding () at controller/binding.c:1056
>>> 12 consider_iface_release (iface_rec=..
>>> iface_id="bb43e818-b2ee-4329-b67e-218556580056") at
>>> controller/binding.c:1880
>>> 13 binding_handle_ovs_interface_changes () at controller/binding.c:1998
>>> 14 runtime_data_ovs_interface_handler () at
>>> controller/ovn-controller.c:1481
>>> 15 engine_compute () at lib/inc-proc-eng.c:306
>>> 16 engine_run_node () at lib/inc-proc-eng.c:352
>>> 17 engine_run () at lib/inc-proc-eng.c:377
>>> 18 main () at controller/ovn-controller.c:2826
>>>
>>> The present code creates a 'struct local_binding' instance for a
>>> container lport and adds this object to the parent local binding
>>> children list. And if the container lport is changed to a normal
>>> vif, then there is no way to access the local binding object created
>>> earlier. This patch fixes these type of issues by refactoring the
>>> 'local binding' code of binding.c. This patch now creates only one
>>> instance of 'struct local_binding' for every OVS interface with
>>> external_ids:iface-id set. A new structure 'struct binding_lport' is
>>> added which is created for a VIF, container and virtual port bindings
>>> and is stored in 'binding_lports' shash. 'struct local_binding' now
>>> maintains a list of binding_lports which it maps to.
>>>
>>> When a container lport is changed to a normal lport, we now can
>>> easily access the 'binding_lport' object of the container lport
>>> fron the 'binding_lports' shash.
>>>
>>> Reported-by: Dumitru Ceara <[email protected]>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1936328
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1936331
>>>
>>> Co-authored-by: Dumitru Ceara <[email protected]>
>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>> Signed-off-by: Numan Siddique <[email protected]>
>>> ---
>>
>> Hi Numan,
>>
>> I didn't do a full review of the code yet but one thing comes to mind.
>>
>> Wouldn't it simplify the code if we always dealt with an updated port
>> binding record in the same way we handle a "delete" folowed by an "insert"?
>>
>> We already store all the relevant information about the old port binding
>> (and if we don't we maybe should) so we could process every port binding
>> update as:
>> a. first delete the port binding resources based on the information we
>> have in memory about the old port binding record.
>> b. insert the new port binding based on the new contents of the record.
>>
> Hi Dumitru,
Hi Numan,
>
> Thanks for the review comments. I am thinking about your suggestion.
> I'm not very sure if this would make the code easier. I have few questions
> for you and few observations.
>
> 1. Let's say we have a lport by name - "lp1" of type VIF. When
> an OVS interface is created for it we claim the lport and set the
> port_binding.chassis
> column. Once the ovn-controller gets the update jsonrpc message
> from ovsdb-server,
> what should we do ? Because it will be a port_binding update ?
> From what I understand from your comment, we should consider it as
> delete first - which means we need
> to delete the binding_lport object for this and remove the lport
> 'lp1' from the b_ctx_out->local_lports,
> b_ctx_out->local_lport_ids etc And then consider it as an insert,
> in which case
> we will create the binding_lport again and also add back in
> local_lports and local_lport_ids.
> We will also end up calling remove_local_lport_ids() first and
> then update_local_lport_ids().
> This may cause unnecessary deletions and additions in tracked_datapaths.
> Correct me If I'm wrong here.
>
> Is this what you think should happen ?
>
> 2. Lets say the lport - "lp1" changes its type from VIF to external.
> If suppose it was bound
> as normal VIF, then we need to clear the chassis column of the
> port bindings. But if we consider
> it as delete first, then we will not be sure if we can call
> 'sbrec_port_binding_set_chassis(pb, NULL)
> unless we add a check if the pb can be accessed or not. I think
> this case will be tricky to handle.
> I'm not sure if we can consider a port binding update as a delete
> followed by insert.
I was kind of vague in my initial question, I apologize. I was thinking
more of abstracting the operations we do for various Southbound updates.
For example, a:
- southbound port binding delete triggers:
a. a "binding release" in the binding.c in-memory binding
representation.
- southbound port binding insert triggers:
a. a "binding create" in the binding.c in-memory binding
representation (e.g., when the OVS interface is added first).
b. a "binding claim" in the binding.c in-memory binding
representation.
- soutbound port binding update triggers:
a. if the update is "destructive" (e.g., port binding type changes),
execute the "binding release" followed by "binding create"/"claim"
operations from the two above cases.
It was more of a thought as I was struggling to imagine all possible
combinations of updates but I guess we can discuss more options like
this as a follow up if/when we decide to refactor/harden binding.c.
>
>> Currently binding_handle_port_binding_changes() seems to have a lot of
>> special cases to deal exactly with this kind of change. It's quite hard
>> to validate if we missed any.
>
> You mean this patch adds a lot of special cases ? or these special
> handling already exists in the present code
> and this patch also makes it complicated ?
This patch adds some cases too but that's expected, I guess, because
it's fixing bugs :)
>
> Let me know if I misunderstood you and also your comments.
>
> Thanks
>
>>
>
>
>
>
>
>> I have a couple more comments below.
>>
>> Thanks,
>> Dumitru
>>
>>>
>>> v2 -> v3
>>> ----
>>> * Fixed a crash seen when a container port is changed to normal port
>>> and then deleted in the same transaction. Added the relevant test case
>>> for this.
>>>
>>> v1 -> v2
>>> ----
>>> * Fixed a crash seen when 2 container ports are changed to normal ports
>>> in the same transaction. Added the relevant test case for this.
>>>
>>> controller/binding.c | 968 +++++++++++++++++++++++-------------
>>> controller/binding.h | 32 +-
>>> controller/ovn-controller.c | 25 +-
>>> controller/physical.c | 13 +-
>>> tests/ovn.at | 247 +++++++++
>>> 5 files changed, 887 insertions(+), 398 deletions(-)
>>>
>>> diff --git a/controller/binding.c b/controller/binding.c
>>> index 4e6c756969..40276d1b12 100644
>>> --- a/controller/binding.c
>>> +++ b/controller/binding.c
>>> @@ -597,6 +597,23 @@ remove_local_lport_ids(const struct sbrec_port_binding
>>> *pb,
>>> }
>>> }
>>>
>>
>> <snip>
>>
>>> +
>>> +static void
>>> +local_binding_destroy(struct local_binding *lbinding,
>>> + struct shash *binding_lports)
>>
>> I'm sure there's a good reason to pass binding_lports as NULL sometimes
>> but it's not really clear why and makes me wonder if there are cases
>> when we can leak stuff.
>>
>> A comment might be helpful.
>
> Ack.
>
>>
>>> +{
>>> + struct binding_lport *b_lport;
>>> + LIST_FOR_EACH_POP (b_lport, list_node, &lbinding->binding_lports) {
>>> + b_lport->lbinding = NULL;
>>> + if (binding_lports) {
>>> + binding_lport_delete(binding_lports, b_lport);
>>> + }
>>> + }
>>> +
>>> + free(lbinding->name);
>>> + free(lbinding);
>>> +}
>>> +
>>> +static void
>>> +local_binding_delete(struct local_binding *lbinding,
>>> + struct shash *local_bindings,
>>> + struct shash *binding_lports)
>>> +{
>>> + shash_find_and_delete(local_bindings, lbinding->name);
>>> + local_binding_destroy(lbinding, binding_lports);
>>> +}
>>> +
>>> +/* Returns the primary binding lport if present in lbinding's
>>> + * binding lports list. A binding lport is considered primary
>>> + * if binding lport's type is LP_VIF and the name matches
>>> + * with the 'lbinding'.
>>> + */
>>> +static struct binding_lport *
>>> +local_binding_get_primary_lport(struct local_binding *lbinding)
>>> +{
>>> + if (!lbinding) {
>>> + return NULL;
>>> + }
>>> +
>>> + struct binding_lport *b_lport;
>>
>> I might be wrong, but isn't it easier to just store the "primary lport"
>> as a separate field in 'struct local_binding' instead of having to walk
>> the list every time we need it?
>
> I thought about it. My concern was when an existing primary lport changes
> its type to 'external' or other type, how to handle it ? or if the lport type
> changes from container to VIF which becomes primart and how to handle it.
>
> May be it can be handled much more easier than I thought. I'll explore more
> and try to address your comment.
Cool, thanks!
>
>>
>> <snip>
>>
>>> diff --git a/controller/binding.h b/controller/binding.h
>>> index c9ebef4b17..f75a787dad 100644
>>> --- a/controller/binding.h
>>> +++ b/controller/binding.h
>>> @@ -56,7 +56,7 @@ struct binding_ctx_in {
>>>
>>> struct binding_ctx_out {
>>> struct hmap *local_datapaths;
>>> - struct shash *local_bindings;
>>> + struct local_binding_data *lbinding_data;
>>>
>>> /* sset of (potential) local lports. */
>>> struct sset *local_lports;
>>> @@ -86,28 +86,16 @@ struct binding_ctx_out {
>>> struct hmap *tracked_dp_bindings;
>>> };
>>>
>>> -enum local_binding_type {
>>> - BT_VIF,
>>> - BT_CONTAINER,
>>> - BT_VIRTUAL
>>> +struct local_binding_data {
>>> + struct shash local_bindings;
>>> + struct shash binding_lports;
>>> };
>>
>> This structure is already refering to "local bindings". Maybe it's more
>> clear if we define it like below?
>>
>> struct local_binding_data {
>> struct shash bindings;
>> struct shash lports;
>> };
>
> Ack. Sounds good.
>
> Thanks
> Numan
>
I noticed you sent out a v4, I'll continue the review on it.
Thanks,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev