Re: [pve-devel] [RFC cluster 1/1] add generic data broadcast interface

Dominik Csapak Fri, 26 Apr 2019 03:09:18 -0700

On 4/26/19 10:43 AM, Thomas Lamprecht wrote:

Am 4/26/19 um 8:21 AM schrieb Dominik Csapak:

similar to how we handle the cluster wide tasklist and rrd data,
have an interface that can sync data across the cluster


this data is only transient and will not be written to disk

we can use this for a number of things, e.g. getting the locks of the
guests clusterwide, listing ceph services across the cluster, etc.


How?? One needs to write it on lock, and if it's temporarily it does not
seem really that great for things like locks?? Also you start to broadcast
everything twice to all, the lock info is already there in the memdb store
no need to do some heuristic parallel heuristics for that..

locks was just an example, but i would have let the pvestatd broadcastthe locks a second time (where we already have the config parsed), justfor the case to show it in /cluster/resources, this is already notsynchronous for remote nodes, so having a little delay there does notchange the general behavior and every bit of information we have

there of the other cluster members gets read with a similar mechanism
(just the rrd data instead of the custom lock one)


Signed-off-by: Dominik Csapak <d.csa...@proxmox.com>
---
  data/PVE/Cluster.pm | 47 +++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 47 insertions(+)

diff --git a/data/PVE/Cluster.pm b/data/PVE/Cluster.pm
index 5af11e6..ea36a0a 100644
--- a/data/PVE/Cluster.pm
+++ b/data/PVE/Cluster.pm
@@ -540,6 +540,53 @@ sub get_nodelist {
      return [ keys %$nodelist ];
  }

+# best effort data store for cluster

+# this data is gone if the pmxcfs is restarted, but only the local data,


if one is restarted or if all are? Else, who cleans up locks, or other data,
on node failure (in any way possible), else this _will_ have very confusing
outcomes...

the way pmxcfs handles the kv store is that for the local node it looksup the local kvhash (which is empty after a restart) but for remote

nodes looks them up in the cluster synchronized kvhash of the node

i was not sure if this is intended or a bug, but i treated it like it isintended. the whole thing is also not really documented (imported from svn)


so we have to either lookup the local data seperately
or make sure we broadcasted it once before looking it up,
or just live with it that after a restart, the data is
not current until we broadcast it the first time

+# so we should not use this for very important data
+sub broadcast_data {


broadcast_node_kv ?

ok

+    my ($key, $data) = @_;
+
+    my $size = length(encode_json($data));
+    if ($size >= (32 * 1024)) {
+       warn "data for '$key' too big\n";
+       return;
+    }
+
+    eval {
+       &$ipcc_update_status($key, $data);


so we just allow overwriting existing keys? how about a prefix like we have
for "rrd/"


yeah makes sense

+    };
+
+    warn $@ if $@;
+}
+
+sub get_data {


get_node_kv ?

ok

+    my ($key, $nodename) = @_;
+
+    my $res = {};
+    my $sub = sub {


maybe a name which actually says something would help reading this ;-)


yeah, i guess sometimes i am a bit too lazy with names ;)

+       my ($node) = @_;
+       eval {
+           my $raw = &$ipcc_get_status($key, $node);


use new coderef call syntax, don't mix it in the same patch..
(above &$ vs below $sub->())


of course (copy-is-my-hobby)

+           my $data = decode_json($raw) if $raw;
+           $res->{$node} = $data;
+       };
+       my $err = $@;
+       syslog('err', $err) if $err;
+    };
+
+    if ($nodename) {
+       $sub->($nodename);
+    } else {
+       my $nodelist = get_nodelist();
+
+       foreach my $node (@$nodelist) {
+           $sub->($node);
+       }


so this is not arbitrary data indexed by key, but a value store indexed with
($key, $node), i.e., same key on different nodes can hold different data, can
be OK, but this needs to be noted, i.e., through more specific method names

ok, i just wanted to not loose the node info without becoming tospecific about the structure of the data, but yes a different name makessenses, maybe we can have multiple versions of this with differentstructures? only if we need it of course


not yet really sure about this, or better said, I'm sure I don't want this for
locks, but for things like ceph service states it could be OK.


yeah i am not sure about this either, hence i sent it as RFC ;)
i would prefer it for locks over the config parsing in the api call though

+    }
+
+    return $res;
+}
+
  # $data must be a chronological descending ordered array of tasks
  sub broadcast_tasklist {
      my ($data) = @_;



_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [RFC cluster 1/1] add generic data broadcast interface

Reply via email to