Hi Han,
I thought about this more over the weekend, and I was hoping I'd get to
respond to my own e-mail before you saw it, because I realized I had a
fundamental misunderstanding of the scope and nature of change handlers.
I'll reply to your comments in-line below.
On 08/05/2018 03:11 PM, Han Zhou wrote:
Hi Mark,
Thanks for the review and very valuable comments! (I was on vacation
last week so sorry for slow response).
On Tue, Jul 31, 2018 at 3:47 AM, Mark Michelson <mmich...@redhat.com
<mailto:mmich...@redhat.com>> wrote:
>
> Hi Han,
>
> I've given this patchset a look, and I was following along pretty
well until I got to about patch 11. From that point on, I had to re-read
the code more times than I care to admit before I finally understood
what was going on :)
>
> What you have is a structure (lflow_ref_list_node) that is
simultaneously in two lists. These two lists are each the data in
separate hmap nodes. The hmap nodes are in two separate hmaps. One hmap
uses the reference type and name as a key, and the other uses the lflow
UUID as a key. This way given an address set name, you can find the
associated logical flow UUIDs in the ref_lflow_table. Or given a logical
flow UUID, you can find address sets.
>
> I'm wondering if this can be simplified somehow.
>
> Right now, if logical flows change, the change handler has to update
the ref_lflow_table so that address sets no longer reference that
logical flow. If address sets change, then the lflow_ref_table is
updated. In both cases consider_logical_flow() gets called and realigns
the tables as appropriate.
>
> The problem with this is that it reeks of cross-cutting concerns, and
it seems like it wouldn't scale well (consider a 3- or 4-chain of
dependencies). Ideally, the dependency chain would make sure that the
change handler for logical flows only deals with logical flows, and the
change handler for address sets only deals with address sets.
I agree that maintaining the cross reference has some overhead, but I
don't see a scaling issue in this case. Adding entries to the cross
reference table is a by-product of the consider_logical_flow() when
parsing the lflow, and deleting the entries is also efficient with O(N),
N = number of address sets used by a lflow, which in most cases should
be a very small number (correct me if I am wrong). As to memory
consumption, it maintains only the mapping between resource names and
lflow uuids, so I don't expect it to be a significant cost either. Could
you explain a little more about the "3- or 4-chain of dependencies" example?
My thought was along the lines that table A references table B, which
references table C. A change in table A might result in a change to
table B, which then results in a change to table C. In my head, I
thought this would mean having to maintain a large series of hashmaps
that cross-referenced each other. I realize this isn't correct, though,
and that as long as the dependency chain is linear, this isn't any
different from what you already have proposed here.
In reality, I can't think of any example changes in the southbound
database that would cause such a series of events, but I may not be
thinking hard enough :)
For the "cross-cutting concern", I don't see it that way. I see it as a
pattern of change handler implementation. In general, output of an
engine node is the result of a "join" operation of its inputs. When
there are multiple inputs and one of them changes, for a change handler
to compute the output incrementally, we will need to use the changed row
to probe all the other inputs to update the final output. For the
address-set and port-group handlers, it is join between two tables only,
and the cross reference table is built to make the probing efficient in
change handler. The cross reference table is also generalized so that
any resources referenced by logical flows can reuse the same data
structure and interfaces, and now it is reused by both address-sets and
port-groups. We can make it more generalized to be used for other
mappings if needed.
Yes, and this sort of thinking is what I had over the weekend that made
me have a "Eureka!" moment and realize what I had been missing here.
I had been looking at the address set change handler and thinking of the
change handler as being an "owner" of address set data. The reason is
that the engine node is tied directly to updates to the address set
table. It felt like it was overstepping its boundaries by then stepping
through data that I thought was owned by the logical flow change
handler. The fact that both change handlers acted on the same data just
struck me as wrong.
However, I realized I need to stop thinking about data ownership in that
sort of way when it comes to change handlers. Engine nodes do have
ownership like I was imagining in their run() method, but change
handlers are very different. They are responsible for analyzing more
than just the data that has changed, but also for analyzing the
relationship that data has with other data. That other data may or may
not be tied to other engine nodes.
If we really wants to make it more generalized, I think the answer is
the datalog approach. I would be great if it can be implemented that
way, but I am pessimistic for it to be applied to ovn-controller in a
practical time line, given that ovn-controller is more complex in terms
of both data sources and processing logic compared with ovn-northd. And
I think it is practical and simple to implement the probing for most
frequent scenarios as demonstrated by this RFC.
I agree. I don't think datalog is the correct approach for ovn-controller.
>
> If we generalize things a bit, there are likely to be two ways
dependencies manifest in the database. In this particular case, text in
one row expands to data of a separate database row. The other case would
be where a database row contains the UUID (or list of UUIDs) of other
database rows.
>
> For the textual case, I think the easiest way to handle this is to
replace the text with what it expands to earlier than when we currently
do it. Consider that a logical flow references address set $foo.
Currently, the logical flow in the southbound database has the text
"$foo" in it. If $foo were replaced with the actual addresses from the
address set, then when an address set changes, the text of the logical
flow would change as well, thus resulting in a direct change of the
logical flow. A less disruptive version of this might be to use some
reserved character automatically in the logical flow match followed by a
sequence number. So for instance, if a logical flow were set up to
reference address set $foo, then the actual logical flow might be
something like $foo?1. Then if northbound address set foo changes, the
logical flow could be updated to $foo?2 by ovn-northd. Again, the
textual change in the logical flow would result in triggering the change
tracker.
>
This proposal is interesting and I think it is a valid alternative. It
is trying to implement the probing without maintaining a cross reference
table in ovn-controller. In fact it moves the effort of building the
reference table from ovn-controller to maintaining the sequence number
for each address-set/port-group resources in ovn-northd. I am just not
sure if this makes the system simpler or more complex. I will need to
think more about it.
Yes, having thought about this some more, I agree that this could just
be trading one complexity for another. Plus, aside from address sets and
port groups, I'm not sure that there is any other text expansion type
references in the southbound database. So engineering a big solution for
this may not have a lot of bang for your buck. It may be worth
workshopping just to see, though.
> For the database referencing case, it would be nice if the IDL change
tracking code could automatically do this for us. This way if record foo
has a column that references row bar, then if bar changes, we would be
told that foo also changed. This strikes me as difficult to implement
and could result in some interesting dependency graphs within the IDL
code though.
>
> What do you think?
>
For the database referencing case, it seems not directly related to your
concern regarding address sets handling (or port-group handling). Please
correct me if I misunderstood something here. But I agree with idea of
utilizing and improving the IDL capability to build the dependency graph
for table references, and this is exactly in my TODO as mentioned in the
cover letter:
You are correct that this does not apply to address set or port group
referencing by logical flows. The relationship between logical flows and
address sets and port groups is not currently expressible at the IDL
level. I was thinking ahead a bit about how other tables may refer to
each other. I foresaw similar structures in change handlers for those
tables and wondered if that could be handled at the IDL level instead.
"For exposing the dependencies introduced by reference access, it is a big
TODO item and it is the major reason this patch series is RFC only."
Thanks,
Han
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev