On 05/23/2017 07:23 AM, Chris Dent wrote:
That "higher dev cost" is one of my objections to the 'active'
approach but it is another implication that worries me more. If we
limit deployer architecture choices at the persistence layer then it
seems very likely that we will be tempted to build more and more
power and control into the persistence layer rather than in the
so-called "business" layer. In my experience this is a recipe for
ossification. The persistence layer needs to be dumb and
replaceable.

Err, in my experience, having a *completely* dumb persistence layer -- i.e. one that tries to assuage the differences between, say, relational and non-relational stores -- is a recipe for disaster. The developer just ends up writing join constructs in that business layer instead of using a relational data store the way it is intended to be used. Same for aggregate operations. [1]

Now, if what you're referring to is "don't use vendor-specific extensions in your persistence layer", then yes, I agree with you.

Best,
-jay

[1] Witness the join constructs in Golang in Kubernetes as they work around etcd not being a relational data store:

https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/deployment_controller.go#L528-L556

Instead of a single SQL statement:

SELECT p.* FROM pods AS p
JOIN deployments AS d
ON p.deployment_id = d.id
WHERE d.name = $name;

the deployments controller code has to read every Pod message from etcd and loop through each Pod message, returning a list of Pods that match the deployment searched for.

Similarly, Kubenetes API does not support any aggregate (SUM, GROUP BY, etc) functionality. Instead, clients are required to perform these kinds of calculations/operations in memory. This is because etcd, being an (awesome) key/value store is not designed for aggregate operations (just as Cassandra or CockroachDB do not allow most aggregate operations).

My point here is not to denigrate Kubernetes. Far from it. They (to date) have a relatively shallow relational schema and doing join and index maintenance [2] operations in client-side code has so far been a cost that the project has been OK carrying. The point I'm trying to make is that the choice of data store semantics (relational or not, columnar or not, eventually-consistent or not, etc) *does make a difference* to the architecture of a project, its deployment and the amount of code that the project needs to keep to properly handle its data schema. There's no way -- in my experience -- to make a "persistence layer" that papers over these differences and ends up being useful.

[2] In Kubernetes, all services are required to keep all relevant data in memory:

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/principles.md

This means that code that maintains a bunch of in-memory indexes of various data objects ends up being placed into every component, Here's an example of this in the kubelet (the equivalent-ish of the nova-compute daemon) pod manager, keeping an index of pods and mirrored pods in memory:

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L104-L114

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L159-L181

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to