On 4/25/23 14:11, Ilya Maximets wrote:
> On 4/25/23 00:12, Terry Wilson via discuss wrote:
>> A performance issue that has always bothered me:
>>
>> OVSDB has a set data type that matches up with Python's set data type (an 
>> unordered collection of unique items). The in-tree Python library represents 
>> this set type as a list. Not only does it do that, but every time you call 
>> Row.__getattr__() through accessing a Row with a set-type column, it will 
>> loop through those values, add them to a new Python set (presumably to 
>> remove duplicates)...and then return them as a sorted list. Every single 
>> time the attribute is accessed [1].
>>
>> Some of these sets can be quite huge. In OpenStack Neutron, for example, we 
>> have a default Port Group that all ports are added to by default. This is 
>> many thousands of ports.
>>
>> Now, it would be very simple to just return a set here and users would get 
>> the benefits of both less overhead on attribute access AND the ability to do 
>> O(1) lookups on these sets. Things like "find port groups that have this 
>> port" etc. would be *much* cheaper. The problem is that this breaks the API. 
>> You can no longer do things like Port_Group.ports[0] as set objects are 
>> unordered and do not have __getitem__(), operations like append() don't 
>> exist, etc. This will also break tons of tests because they tend to rely on 
>> order of objects since they do simple string matching. The latter issue is 
>> probably pretty easy to fix in the tests themselves by just sorting the 
>> results in the tests themselves.
>>
>> It's probably possible to create a wrapper type object that makes a set that 
>> kinda looks like a list enough to not break things, but that's also pretty 
>> ugly. So I guess my question is, "what do we think about breaking the API at 
>> some point to fix this?" It's pretty terrible behavior, but it's also 
>> annoying when APIs change.
>>
>> Terry
>>
>> [1] 
>> https://github.com/openvswitch/ovs/blob/d70688a7291edb432fd66b9230a92842fcfd3607/python/ovs/db/data.py#L498-L504
> Hi, Terry.  Thank you for bringing that up.
> 
> I think, the main idea behind the current design is to do whatever
> C implementation does.  And C implementation presents IDL objects
> to the user as sorted arrays.  That translates to ordered lists in
> python.  Another major - and probably the most important - thing is
> that columns can contain weak references to rows that do not exist,
> the code in [1] makes sure that such references are not visible to
> the user.  And this code resolves UUIDs into row objects.
> 
> Another potential thing is that we perform a copy in order to avoid
> internal structures to be modified by the user application. i.e.
> avoid returning direct references to internal structures.  I'm not
> sure if that is really important, but I guess, some user apps may
> rely on that in some form even if they are not aware of that.
> 
> I fully agree that how it is implemented today is really not a
> python way of doing things, and this needs to be improved.
> 
> If we'll go the route of API change, it likely should be a per IDL
> instance configuration knob that will alter the behavior.  So, the
> users can opt-in to the new API.  We may deprecate the old one
> eventually, but we'll need a transitional period with some usual
> deprecation warnings like python libs usually do.
> Wrapper object may be a solution for a transition period to not
> carry two separate implementations.
> But we still need to translate UUIDs to rows, so I'm not sure how
> to avoid that.
> 
> There is an alternative approach to consider.  I believe, C IDL
> implementation had the same problem ages ago.  And the solution
> that is in place until today is to cache the "parsed" version of
> a column, and re-parse it whenever column changes.  For the python
> IDL it may look like having an extra field that holds a sorted list.
> Clear it when something changes and create a new one whenever user
> is accessing the column, but the "parsed" version doesn't exist yet.
> It's not ideal, but should be better than what we have now.
> This carries the same issue of accidental modifications of the
> common internal object though, but again, this might be not very
> important.

I remembered why I didn't end up writing any code the last time
I wanted to fix that issue.  It's because we need to re-parse
the source column whenever the row it references changed.  So,
if A contains UUIDs of rows in a table T and a new row in T
is a row that one of the UUIDs in A points to, then we need to
re-parse column A in order to include resolved row object for
the previously missing reference.  In order to do that we need
to maintain source and destination references between rows.
See the src/dst_arcs in the C implementation.  And the logic
around these is fairly complex including management of orphan
rows that are referenced but do not exist.

> 
> What do you think?
> 
> Best regards, Ilya Maximets.

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to