Re: [DISCUSS] KIP-78: Cluster Id

Dong Lin Fri, 02 Sep 2016 23:45:08 -0700

Hey Ismael,

Thanks for your reply. Please see my comment inline.


On Fri, Sep 2, 2016 at 8:28 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Dong,
>
> Thanks for your feedback. Comments inline.
>
> On Thu, Sep 1, 2016 at 7:51 PM, Dong Lin <lindon...@gmail.com> wrote:
> >
> > I share the view with Harsha and would like to understand how the current
> > approach of randomly generating cluster.id compares with the approach of
> > manually specifying it in meta.properties.
> >
>
> Harsha's suggestion in the thread was to store the generated id in
> meta.properties, not to manually specify it via meta.properties.
>
> >
> > I think one big advantage of defining it manually in zookeeper is that we
> > can easily tell which cluster it is by simply looking at the sensor name,
> > which makes it more useful to the auditing or monitoring use-case that
> this
> > KIP intends to address.
>
>
> If you really want to customise the name, it is possible with the current
> proposal: save the appropriate znode in ZooKeeper before a broker
> auto-generates it. We don't encourage that because once you have a
> meaningful name, there's a good chance that you may want to change it in
> the future. And things break down at that point. That's why we prefer
> having a generated, unique and immutable id complemented by a changeable
> human readable name. As described in the KIP, we think the latter can be
> achieved more generally via resource tags (which will be a separate KIP).
>
> Can you elaborate what will break down if we need to change the name?

Even if we can not change name because something will breakdown in that
case, it seems that it is still better to read id from config than using a
randomly generated ID. In my suggested solution user can simply choose not
to change the name and make sure there is unique id per cluster. In your
proposal you need to store the old cluster.id and manually restore it in
zookeeper in some scenarios. What do you think?


> > On the other hand, if you can only tell whether two
> > sensors are measuring the same cluster or not. Also note that even this
> > goal is not easily guaranteed, because you need an external mechanism to
> > manually re-generate znode with the old cluster.id if znode is deleted
> or
> > if the same cluster (w.r.t purpose) is changed to use a different
> > zookeeper.
> >
>
> If we assume that znodes can be deleted at random, the cluster id is
> probably the least of one's worries. And yes, when moving to a
> different ZooKeeper while wanting to retain the cluster id, you would have
> to set the znode manually. This doesn't seem too onerous compared to the
> other work you will have to do for this scenario.
>
> Maybe this work is not much compared to other work. But we can agree that
no work is better than little work, right? I am interested to see if we can
avoid the work and still meet the motivation and goals of this KIP.


> > I read your reply to Harsha but still I don't fully understand your
> concern
> > with that approach. I think the broker can simply register group.id in
> > that
> > znode if it is not specified yet, in the same way that this KIP proposes
> to
> > do it, right? Can you please elaborate more about your concern with this
> > approach?
> >
>
> It's a bit difficult to answer this comment because it seems like the
> intent of your suggestion is different than Harsha's.
>
> I am not necessarily opposed to storing the cluster id in meta.properties
> (note that we have one meta.properties per log.dir), but I think there are
> a number of things that need to be discussed and I don't think we need to
> block KIP-78 while that takes place. Delivering features incrementally is a
> good thing in my opinion (KIP-31/32, KIP-33 and KIP-79 is a good recent
> example).
>

If I understand it right, the motivation of this KIP is to allow cluster to
be uniquely identified. This is a useful feature and I am not asking for
anything beyond this scope. It is just that reading cluster.id from config
seems to be a better solution in order to meet the motivation and all the
goals described in the KIP. More specifically, using cluster.id not only
allows user to distinguish between different clusters, it also lets user
identify cluster. In comparison, randomly generated cluster.id allows user
to distinguish cluster with a little bit more effort, and doesn't allow
user to identify a cluster by simply reading e.g. sensor name. Did I miss
something here?


>
> Ismael
>
> P.S. For what is worth, the following version of the KIP includes an
> incomplete description (it assumes a single meta.properties, but there
> could be many) of what the broker would have to do if we wanted to save to
> meta.properties and potentially restore the znode from it. The state space
> becomes a lot more complex, increasing potential for bugs (we had a few for
> generated broker ids). In contrast, the current proposal is very simple and
> doesn't prevent us from introducing the additional functionality later.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868433
>

IMO reading cluster.id from config should be as easy as reading broker id
from config. Storing cluster.id from config in znode requires the same
amount of effort as storing randomly generated cluster.id in znode. Maybe I
missed something here. Can you point me to the section of the KIP that
explains why it is more difficult if we want to read cluster.id from config?

Re: [DISCUSS] KIP-78: Cluster Id

Reply via email to