Re: [VOTE] KIP-4 Metadata Schema

Jun Rao Mon, 04 Apr 2016 11:35:11 -0700

Grant,

2. The issues that I was thinking are the following. (a) Say the controller
has topic deletion disabled and a topic deletion request is submitted to
ZK. In this case, the controller will ignore this request. However, the
broker may pick up the topic deletion marker in a transient window. (b)
Suppose that a topic is deleted and then recreated immediately. It is
possible for a broker to see the newly created topic and then the previous
topic deletion marker in a transient window. Thinking about this a bit
more. Both seem to be transient. So, it may not be a big concern. So, I am
ok with this as long as the interim solution is not too complicated.
Another thing to think through. If a topic is marked for deletion, do we
still return the partition level metadata?


3. Your explanation on controller id seems reasonable to me.

5. The issue is the following. If you have a partition with 3 replicas
4,5,6, leader is on replica 4 and replica 5 is down. Currently, the broker
will send a REPLICA_NOT_AVAILABLE error code and only replicas 4,6 in the
assigned replicas. It's more intuitive to send no error code and 4,5,6 in
the assigned replicas in this case.

Thanks,

Jun

On Mon, Apr 4, 2016 at 8:33 AM, Grant Henke <[email protected]> wrote:

> Hi Jun,
>
> Please See my responses below:
>
> Hmm, I am not sure about the listener approach. It ignores configs like
> > enable.topic.deletion and also opens the door for potential ordering
> issues
> > since now there are two separate paths for propagating the metadata to
> the
> > brokers.
>
>
> This mechanism is very similar to how deletes are tracked on the controller
> itself. It is also the same way ACLs are tracked on brokers in the default
> implementation. I am not sure I understand what ordering issue there could
> be. This is used to report what topics are marked for deletion, which today
> has no dependency on enable.topic.deletion. I agree that the delete
> mechanism in Kafka has a lot of room for improvement, but the goal in this
> change is just to enable reporting it to the user, not to fix/improve
> existing issues. If you have an alternate approach that does not require
> major changes to the controller code, I would be open to investigate it.
>
> Could we just leave out markedForDeletion for now? In the common
> > case, if a topic is deleted, it will only be in markedForDeletion state
> for
> > a few milli seconds anyway.
>
>
> I don't think we should leave it out. The point of these changes is to
> prevent a user from needing to talk directly to zookeeper. We need a way
> for a user to see if a topic has been marked for deletion. Given the issues
> with the current delete implementation, its fairly common for a topic to
> remain marked as deleted for quite some time.
>
> Yes, for those usage, it just seems it's a bit weird for the client to
> > issue a MetadataRequest to get the controller info since it doesn't need
> > any topic metadata.
>
>
> Why does this seam weird? The MetadataRequest is the request used to
> discover the cluster and metadata about that cluster regardless of the
> topics you are interested in, if any. In fact, a big motivation for the
> change to allow requesting "no topics" is because the existing producer and
> consumer often want to learn about the cluster without asking for topic
> metadata and today that means that they request all topics.
>
> 5. The issue is that for a client, when handling a metadata response, the
> > natural logic is if there is any error in the response, go to the error
> > handling path (e.g., back off and refresh metadata). Otherwise, get the
> > leader info and initiate a request to the leader if leader is available.
> If
> > you look at the current logic in MetadataCache.getPartitionMetadata(), if
> > an assigned replica is not alive, we will send a REPLICA_NOT_AVAILABLE
> > error code in the response. If the client follows the above logic, it
> will
> > keep doing the error handling even though there is nothing wrong with the
> > leader. A better behavior is to simply return the list of replica ids
> with
> > no error code in this case.
>
>
> To be sure I understand this correctly. Instead of returning the complete
> list of replicas, including the ones that errored as unavailable. You are
> suggesting to drop the unavailable ones and return just the replicas with
> no-errors and return no error code on the partition. Is that correct?
>
> Under what scenario does the MetadataCache have a replica that is not
> available?
>
> Thanks,
> Grant
>
>
>
>
>
> On Mon, Apr 4, 2016 at 12:25 AM, Jun Rao <[email protected]> wrote:
>
> > Grant,
> >
> > 2. Hmm, I am not sure about the listener approach. It ignores configs
> like
> > enable.topic.deletion and also opens the door for potential ordering
> issues
> > since now there are two separate paths for propagating the metadata to
> the
> > brokers. Could we just leave out markedForDeletion for now? In the common
> > case, if a topic is deleted, it will only be in markedForDeletion state
> for
> > a few milli seconds anyway.
> >
> > 3. Yes, for those usage, it just seems it's a bit weird for the client to
> > issue a MetadataRequest to get the controller info since it doesn't need
> > any topic metadata.
> >
> > 5. The issue is that for a client, when handling a metadata response, the
> > natural logic is if there is any error in the response, go to the error
> > handling path (e.g., back off and refresh metadata). Otherwise, get the
> > leader info and initiate a request to the leader if leader is available.
> If
> > you look at the current logic in MetadataCache.getPartitionMetadata(), if
> > an assigned replica is not alive, we will send a REPLICA_NOT_AVAILABLE
> > error code in the response. If the client follows the above logic, it
> will
> > keep doing the error handling even though there is nothing wrong with the
> > leader. A better behavior is to simply return the list of replica ids
> with
> > no error code in this case.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Sun, Apr 3, 2016 at 6:29 PM, Grant Henke <[email protected]> wrote:
> >
> > > Responding to a few of the other comments:
> > >
> > > it seems that you propagated
> > > > the topic deletion marker by having the replicaManager read from ZK
> > > > directly. It seems that it would be simpler/consistent if the
> > controller
> > > > propagates that information directly through UpdateMetaRequest.
> > >
> > >
> > > I was told that I should not try and modify controller logic with KIP-4
> > > changes. It was indicated that a larger controller rewrite and testing
> > was
> > > planned and those changes should be considered then. Since marking a
> > topic
> > > for deletion doesn't flow through the controller and therefore the
> > > UpdateMetadataRequest,
> > > it would take quite a bit of change. We would need to trigger a new
> > > UpdateMetadataRequest
> > > every time a new topic is marked for deletion.
> > >
> > > Instead I added a listener to maintain a cache of the topic deletion
> > znodes
> > > in the ReplicaManager where the existing UpdateMetadataRequests are
> > > handled. This would make it easy to swap out later once the data is a
> > part
> > > of that request and have minimal impact in the mean time.
> > >
> > >
> > > > Could you add a description on how controller id will be used in the
> > > > client?
> > >
> > >
> > > I will add it to the wiki. Today metrics are the only way to access
> this
> > > piece of data. It is useful information about the cluster for many
> > reasons.
> > > Having programatic access to identify the controller is helpful for
> > > automation. For example, It can be used during rolling restart logic to
> > > shutdown the controller last to prevent multiple fail overs. Beyond
> > > automation, it can be leveraged in KIP-4 to route admin requests to the
> > > controller broker.
> > >
> > > We had a weird semantic in version 0 of MetadataRequest. If a replica
> is
> > > > not live, but the leader is live, we return an
> > > > error ReplicaNotAvailableException in the partition metadata. This
> > makes
> > > it
> > > > a bit confusing for the client to parse since it has to first check
> > > whether
> > > > leader is available or not before error code checking. We were
> thinking
> > > of
> > > > changing that behavior the next time we bump up the version of
> > > > MetadataRequest.
> > > > Now that time has come, could you include that in the proposal?
> > >
> > >
> > > I am not sure I completely follow the issue and requested change. Could
> > you
> > > point me to the discussion?
> > >
> > > Thank you,
> > > Grant
> > >
> > > On Sun, Apr 3, 2016 at 8:09 PM, Grant Henke <[email protected]>
> wrote:
> > >
> > > > Hi Jun and Ismael,
> > > >
> > > > Initially I had 2 booleans used to indicate if a topic was internal
> and
> > > if
> > > > a topic was marked for deletion. To save space on large deployments,
> > > Ismael
> > > > suggested I break out the internal topics and deleted topics into
> their
> > > own
> > > > lists. The idea was that instead of 2 bytes added per topic, in the
> > > general
> > > > case the lists would be empty. Even in those lists I still only
> return
> > > > topics that were requested. In fact on the client side they are just
> > > > utilized to translate back to booleans. I do prefer the booleans from
> > an
> > > > expressiveness standpoint but was not strongly opinionated on the
> > > > structure.
> > > >
> > > > Thank you,
> > > > Grant
> > > >
> > > > On Sun, Apr 3, 2016 at 8:00 PM, Ismael Juma <[email protected]>
> wrote:
> > > >
> > > >> Hi Jun,
> > > >>
> > > >> A couple of comments inline.
> > > >>
> > > >> On Sun, Apr 3, 2016 at 5:55 PM, Jun Rao <[email protected]> wrote:
> > > >>
> > > >> > 1. It seems a bit weird to return just a list of internal topics
> w/o
> > > the
> > > >> > corresponding metadata. It also seems a bit weird to return the
> > > internal
> > > >> > topics even if the client doesn't ask for it.
> > > >>
> > > >>
> > > >> Good point.
> > > >>
> > > >>
> > > >> > Would it be better to just
> > > >> > add a flag in topic_metadata to indicate whether it's an internal
> > > topic
> > > >> or
> > > >> > not, and only include the internal topics when thy are asked (or
> all
> > > >> topics
> > > >> > are requested) for?
> > > >> >
> > > >>
> > > >> The disadvantage of this is that we are adding one byte per topic
> even
> > > >> though we have a very small number of internal topics (currently a
> > > single
> > > >> internal topic). It seems a bit wasteful and particularly so when
> > using
> > > >> regex subscriptions (since we have to retrieve all topics in that
> > case).
> > > >>
> > > >> 2. A similar comment on topics_marked_for_deletion. Would it be
> better
> > > to
> > > >> > only return them when asked for and just return a new TopicDeleted
> > > error
> > > >> > code in topic_metadata?
> > > >>
> > > >>
> > > >> I agree that this seems better.
> > > >>
> > > >> Ismael
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Grant Henke
> > > > Software Engineer | Cloudera
> > > > [email protected] | twitter.com/gchenke |
> linkedin.com/in/granthenke
> > > >
> > >
> > >
> > >
> > > --
> > > Grant Henke
> > > Software Engineer | Cloudera
> > > [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> [email protected] | twitter.com/gchenke | linkedin.com/in/granthenke
>

Re: [VOTE] KIP-4 Metadata Schema

Reply via email to