1. Yes, this will be much easier. Okay, let's add it.

2, Okay. This will differ a little bit from the way currently
kafka-topics.sh handles alter-topic command, but I think it's
a reasonable restriction.

I'll update KIP acordingly to our weekly call.

Thanks,
Andrii Biletskyi

On Mon, Apr 20, 2015 at 10:56 PM, Jun Rao <j...@confluent.io> wrote:

> 1. Yes, lag is probably only going to be useful for the admin client.
> However, so is isr. It seems to me that we should get lag and isr from the
> same request. I was thinking that we can just extend TMR by changing
> replicas from an array of int to an array of (int, lag) pairs. Is that too
> complicated?
>
> 3. I was thinking that we just don't allow the cli to change more than one
> thing at a time. So, you will get an error if you want to change both
> partitions and configs.
>
> Thanks,
>
> Jun
>
> On Sun, Apr 19, 2015 at 8:22 AM, Andrii Biletskyi <
> andrii.bilets...@stealth.ly> wrote:
>
> > Jun,
> >
> > 1. Yes, seems we can add lag info to the TMR. But before that I wonder
> > whether there are other reasons we need this info except for reassign
> > partition command? As we discussed earlier the problem with poor
> > monitoring capabilities for reassign-partitions (as currently we only
> > inform
> > users Completed/In Progress per partition) may require separate solution.
> > We were thinking about separate Wire protocol request. And I actually
> like
> > your
> > idea about adding some sort of BrokerMetadataRequest for these purposes.
> > I actually think we can cover some other items (like rack-awareness) but
> > for
> > me it deserves a separate KIP really.
> > Also, adding Replica->Lag map per partition will make
> TopicMetadataResponse
> > very sophisticated:
> > Map[TopicName, Map[PartitionId, Map[ReplicaId, Lag]].
> > Maybe we need to leave it for a moment and propose new request rather
> than
> > making a new step towards one "monster" request.
> >
> > 2. Yes, error per topic.
> > The only question is whether we should execute at least the very first
> > alter topic
> > command from the "duplicated" topic set or return error for all ... I
> think
> > the more
> > predictable and reasonable option for clients would be returning errors
> for
> > all
> > duplicated topics.
> >
> > 3. Hm, yes. Actually we also have "change topic config" there. But it is
> > not
> > related to such "replication" commands as increase replicas or change
> > replica
> > assignment.
> > This will make CLI implementation a bit strange: if user specifies
> increase
> > partitions and change topic config in one line - taking into account 2.
> we
> > will have
> > to create two separate alter topic requests, which were designed as batch
> > requests :),
> > but probably we can live with it.
> > Okay, I will think about a separate error code to cover such cases.
> >
> > 4. We will need InvalidArgumentTopic (e.g. contains prohibited chars),
> > IAPartitions, IAReplicas, IAReplicaAssignment, IATopicConfiguration.
> > A server side implementation will be a little bit messy (like dozens "if
> > this then this
> > error code") but maybe we should think about clients at the first place
> > here.
> >
> > Thanks,
> > Andrii Biletskyi
> >
> > On Fri, Apr 17, 2015 at 1:46 AM, Jun Rao <j...@confluent.io> wrote:
> >
> > > 1. For the lags, we can add a new field "lags" per partition. It will
> > > return for each replica that's not in isr, the replica id and the lag
> in
> > > messages. Also, if TMR is sent to a non-leader, the response can just
> > > include an empty array for isr and lags.
> > >
> > > 2. So, we will just return a topic level error for the duplicated
> topics,
> > > right?
> > >
> > > 3. Yes, it's true that today, one can specify both partitions and
> > > replicaAssignment in the TopicCommand. However, partitions is actually
> > > ignored. So, it will be clearer if we don't allow users to do this.
> > >
> > > 4. How many specific error codes like InvalidPartitions and
> > InvalidReplicas
> > > are needed? If it's not that many, giving out more specific error will
> be
> > > useful for non-java clients.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Apr 15, 2015 at 10:23 AM, Andrii Biletskyi <
> > > andrii.bilets...@stealth.ly> wrote:
> > >
> > > > Guys,
> > > >
> > > > Thanks for the discussion!
> > > >
> > > > Summary:
> > > >
> > > > 1. Q: How KAFKA-1367 (isr is inconsistent in brokers' metadata cache)
> > can
> > > >         affect implementation?
> > > >     A: We can fix this issue for the leading broker - ReplicaManager
> > (or
> > > > Partition)
> > > >         component should have accurate isr list, then with leading
> > broker
> > > > having correct
> > > >         info, to do a describe-topic we will need to define leading
> > > brokers
> > > > for partitions
> > > >         and ask those for a correct isr list.
> > > >         Also, we should consider adding lag information to TMR for
> each
> > > > follower for
> > > >         partition reassignment, as Jun suggested above.
> > > >
> > > > 2. Q: What if user adds different alter commands for the same topic
> in
> > > > scope
> > > >          of one batch request?
> > > >     A: Because of the async nature of AlterTopicRequest it will be
> very
> > > > hard then
> > > >         to "assemble" the expected (in terms of checking whether
> > request
> > > is
> > > > complete)
> > > >         result if we let users do this. Also it will be very
> confusing.
> > > It
> > > > was proposed not to
> > > >         let users do this (probably add new Error for such cases).
> > > >
> > > > 3. Q: AlterTopicRequest semantics: now when we merged AlterTopic and
> > > >         ReassingPartitons in which order AlterTopic fields should be
> > > > resolved?
> > > >     A: This item is not clear. There was a proposal to let user
> change
> > > only
> > > >         one thing at a time, e.g. specify either new Replicas, or
> > > > ReplicaAssignment.
> > > >         This can be a simple solution, but it's a very strict rule.
> > E.g.
> > > > currently with
> > > >         TopicCommand user can increase nr of partitions and define
> > > replica
> > > > assignment
> > > >         for newly added partitions. Taking into account item 2. this
> > will
> > > > be even harder
> > > >         for user to achieve this.
> > > >
> > > > 4. Q: Do we need such accurate errors returned from the server:
> > > > InvalidArgumentPartitions,
> > > >          InvalidArgumentReplicas etc.
> > > >     A: I started implementation to add proposed error codes and now I
> > > think
> > > > probably
> > > >         InvalidArgumentError should be sufficient. We can do simple
> > > > validations on
> > > >         the client side (e.g. AdminClient can ensure nr of partitions
> > > > argument is positive),
> > > >         others - which can be covered only on server (probably
> invalid
> > > > topic config,
> > > >         replica assignment includes dead broker etc) - will be done
> on
> > > > server, and in case
> > > >         of invalid argument we will return InvalidArgumentError
> without
> > > > specifying the
> > > >         concrete field.
> > > >
> > > > It'd be great if we could cover these remaining issues, looks like
> they
> > > are
> > > > minor,
> > > > at least related to specific messages, not the overall protocol. - I
> > > think
> > > > with that I can
> > > > update confluence page and update patch to reflect all discussed
> items.
> > > > This patch
> > > > will probably include Wire protocol messages and server-side code to
> > > handle
> > > > new
> > > > requests. AdminClient and cli-tool implementation can be the next
> step.
> > > >
> > > > Thanks,
> > > > Andrii Biletskyi
> > > >
> > > > On Wed, Apr 15, 2015 at 7:26 PM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Andrii,
> > > > >
> > > > > 500. I think what you suggested also sounds reasonable. Since ISR
> is
> > > only
> > > > > maintained accurately at the leader, TMR can return ISR if the
> broker
> > > is
> > > > > the leader of a partition. Otherwise, we can return an empty ISR.
> For
> > > > > partition reassignment, it would be useful to know the lag of each
> > > > > follower. Again, the leader knows this info. We can probably
> include
> > > that
> > > > > info in TMR as well.
> > > > >
> > > > > 300. I think it's probably reasonable to restrict AlterTopicRequest
> > to
> > > > > change only one thing at a time, i.e., either partitions, replicas,
> > > > replica
> > > > > assignment or config.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Apr 13, 2015 at 10:56 AM, Andrii Biletskyi <
> > > > > andrii.bilets...@stealth.ly> wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > 404. Great, thanks!
> > > > > >
> > > > > > 500. If I understand correctly KAFKA-1367 says ISR part of TMR
> may
> > > > > > be inconsistent. If so, then I believe all admin commands but
> > > > > describeTopic
> > > > > > are not affected. Let me emphasize that it's about AdminClient
> > > > > operations,
> > > > > > not about Wire Protocol requests. What I mean:
> > > > > > To verify AdminClient.createTopic we will need (consistent)
> > 'topics'
> > > > set
> > > > > > from TMR (we don't need isr)
> > > > > > To verify alterTopic - again, probably 'topics' and 'assigned
> > > > replicas' +
> > > > > > configs
> > > > > > To verify deleteTopic - only 'topics'
> > > > > > To verify preferredReplica - 'leader', 'assigned replicas'
> > > > > > To verify reassignPartitions - 'assigned replicas' ? (I'm not
> sure
> > > > about
> > > > > > this one)
> > > > > > If everything above is correct, then AdminClient.describeTopic is
> > the
> > > > > only
> > > > > > command under risk. We can actually workaround it - find out the
> > > leader
> > > > > > broker
> > > > > > and ask TMR that leading broker to get up-to-date isr list.
> > > > > > Bottom line: looks like 1367 is a separate issue, and is not a
> > > blocker
> > > > > for
> > > > > > this
> > > > > > KIP. I'm a bit concerned about adding new requests as a must-have
> > > part
> > > > > > of this KIP when we don't know what we want to include to those
> > > > requests.
> > > > > >
> > > > > > Also, I'd like to write down the new AlterTopicRequest semantics
> > (if
> > > we
> > > > > > decide
> > > > > > to include replicas there and merge it with
> > > ReassignPartitionsRequest)
> > > > > > 300. AlterTopicRequest => [TopicName Partitions Replicas
> > > > > ReplicaAssignment
> > > > > > [AddedConfigEntry] [DeletedConfig]]
> > > > > > The fields are resolved in this sequence:
> > > > > > 1. Either partition or replicas is defined:
> > > > > > ---1.1. ReplicaAssignment is not defined - generate automatic
> > replica
> > > > > > assignment
> > > > > >           for newly added partitions or for replicas parameter
> > > > increased
> > > > > > ---1.2. ReplicaAssignment is defined - increase topic partitions
> if
> > > > > > 'partitions' defined,
> > > > > >           reassign partitions according to ReplicaAssignment
> > > > > > 2. Neither partition nor replicas is defined:
> > > > > > ---2.1. ReplicaAssignment is defined - it's a reassign replicas
> > > request
> > > > > > ---2.2. ReplicaAssignment is not defined - just change topic
> > configs
> > > > > > 3. Config fields are handled always and independently from
> > > > > >     partitions+replicas/replicaAssingment
> > > > > > A bit sophisticated, but should cover all cases. Another option -
> > we
> > > > can
> > > > > > say you can define either partitions+replicas or
> replicaAssignment.
> > > > > >
> > > > > > 300.5. There is also a new question related to AlterTopicRequest
> -
> > > > should
> > > > > > we
> > > > > > allow users multiple alter-topic instructions for one topic in
> one
> > > > batch?
> > > > > > I think if we go this way, user will expect we optimize and group
> > > > > requests
> > > > > > for one topic, but it will add a lot of burden, especially taken
> > into
> > > > > > account
> > > > > > async semantics of the AlterTopicRequest. I'd rather return some
> > > error
> > > > > > code,
> > > > > > or ignore all but first. Thoughts?
> > > > > >
> > > > > > Thanks,
> > > > > > Andrii Biletskyi
> > > > > >
> > > > > > On Mon, Apr 13, 2015 at 6:34 AM, Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Andrii,
> > > > > > >
> > > > > > > 404. Jay and I chatted a bit. We agreed to leave
> > createTopicRequest
> > > > as
> > > > > > > async for now.
> > > > > > >
> > > > > > > There is another thing.
> > > > > > >
> > > > > > > 500. Currently, we have this issue (KAFKA-1367) that the ISR in
> > the
> > > > > > > metadata cache can be out of sync. The reason is that ISR is
> > really
> > > > > > > maintained at the leader. We can potentially add a new
> > > > > BrokerMetaRequest,
> > > > > > > which will return useful stats specific to a broker. Such stats
> > can
> > > > > > include
> > > > > > > (1) for each partition whose leader is on this broker, the ISR
> > and
> > > > the
> > > > > > lag
> > > > > > > (in messages) for each of the followers, (2) space used per
> > > > partition,
> > > > > > (3)
> > > > > > > remaining space per log dir (not sure how easy it is to get
> this
> > > > info).
> > > > > > If
> > > > > > > we have this new request, we can probably remove the ISR part
> > from
> > > > TMR
> > > > > > v1.
> > > > > > > Currently, the producer/consumer client don't really care about
> > > ISR.
> > > > > The
> > > > > > > admin client will then issue BrokerMetaRequest to find out ISR
> > and
> > > > > other
> > > > > > > stats.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 7, 2015 at 12:10 PM, Andrii Biletskyi <
> > > > > > > andrii.bilets...@stealth.ly> wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > A summary of our discussion:
> > > > > > > >
> > > > > > > > 201. Q: Cluster updates in backward compatible way.
> > > > > > > >         A: Add topicConfigs map property and change
> > constructor,
> > > > this
> > > > > > > > shouldn't break Consumer/Producer since TMR is used in
> > > > NetworkClient,
> > > > > > > > not directly by Consumer/Producer.
> > > > > > > >
> > > > > > > > 300. Q: Can we merge AlterTopic and ReassignPartitions
> > requests?
> > > > > > > >         A: It looks like in terms of Wire Protocol partition
> > > > > > reassignment
> > > > > > > > can
> > > > > > > > be just an application of AlterTopicRequest. On the
> AdminClient
> > > > side
> > > > > we
> > > > > > > can
> > > > > > > > split this into two separate methods, if needed.
> > > > > > > >
> > > > > > > > Some additional items that were added today:
> > > > > > > > 400. Q: Do we need ListTopicsRequest, we can use TMR for this
> > > > > purpose.
> > > > > > > >         A: The answer depends on whether we can leverage
> > > ListTopics
> > > > > in
> > > > > > > > consumer/producer, because the only benefit of the ListTopics
> > is
> > > > > > > > performance
> > > > > > > > optimization, otherwise it doesn't worth it.
> > > > > > > >
> > > > > > > > 401. Q: AdminClient.topicExists - do we need it?
> > > > > > > >         A: AdminClient.listTopics should be sufficient.
> > > > > > > >
> > > > > > > > 402. Review AdminClient API and use separate objects instead
> of
> > > > > > > collections
> > > > > > > > for methods arguments / return results (e.g. preferredReplica
> > > > accepts
> > > > > > > > Map<String, List<Int>>
> > > > > > > > might be better to add separate java object)
> > > > > > > >
> > > > > > > > 403. Error number in KIP-4 (100x). Currently there are no
> > > dedicated
> > > > > > > ranges
> > > > > > > > for errors, we will probably continue doing it this way.
> > > > > > > >
> > > > > > > > 404. There were some concerns again about the asynchronous
> > > > semantics
> > > > > > > > of the admin requests. Jun and Jay to agree separately how we
> > > want
> > > > > > > > to handle it.
> > > > > > > >
> > > > > > > > Please add / correct me if I missed something.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Andrii Biletskyi
> > > > > > > >
> > > > > > > > On Tue, Apr 7, 2015 at 4:11 PM, Andrii Biletskyi <
> > > > > > > > andrii.bilets...@stealth.ly> wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I wasn't able to send email to our thread (it says we
> > exceeded
> > > > > > message
> > > > > > > > > size limit :)).
> > > > > > > > > So I'm starting the new one.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jun,
> > > > > > > > >
> > > > > > > > > Thanks again for the review. Answering your comments:
> > > > > > > > >
> > > > > > > > > 201. I'm not sure I understand how can we evolve Cluster in
> > > > > backward
> > > > > > > > > compatible way. In my understanding topic configs are not
> > > > returned
> > > > > > > > > currently -
> > > > > > > > > in TMR_V0. Thus we need to add new property in Cluster -
> smth
> > > > like
> > > > > > > > > private final Map<String, List<ConfigEntry>> topicConfigs;
> > > > > > > > > Which affects Cluster constructor, which is used in
> > > > > > > MetadataResponse.java
> > > > > > > > > - not sure whether we can change Cluster this way so it's
> > > > backward
> > > > > > > > > compatible,
> > > > > > > > > I suppose - no.
> > > > > > > > > Let me know if I'm missing something...
> > > > > > > > >
> > > > > > > > > 300. Hm, so you propose to give up ReassignPartition as a
> > > > separate
> > > > > > > > command?
> > > > > > > > > That's interesting, let's discuss it today in detail.
> > > > > > > > > Two small points here: 1) afaik currently
> replica-assignment
> > > > > argument
> > > > > > > in
> > > > > > > > > alter-topic
> > > > > > > > > (from TopicCommand) doesn't reassign partitions, it lets
> > users
> > > > > > specify
> > > > > > > > > replica
> > > > > > > > > assignment for newly added partition
> (AddPartitionsListener)
> > 2)
> > > > > > > > > ReassignPartitions
> > > > > > > > > command involves a little bit more than just changing
> replica
> > > > > > > assignment
> > > > > > > > > in zk.
> > > > > > > > > People are struggling with partition reassignment so I
> think
> > > it's
> > > > > > good
> > > > > > > to
> > > > > > > > > have explicit
> > > > > > > > > request for it so we can handle it independently, also as
> > > > mentioned
> > > > > > > > > earlier we'll
> > > > > > > > > probably add in future some better status check procedure
> for
> > > > this
> > > > > > > > > long-running
> > > > > > > > > request.
> > > > > > > > >
> > > > > > > > > 301. Good point. We also agreed to use clientId as an
> > > identifier
> > > > > for
> > > > > > > the
> > > > > > > > > requester -
> > > > > > > > > whether it's a producer client or admin. I think we can go
> > with
> > > > > -1/-1
> > > > > > > > > approach.
> > > > > > > > >
> > > > > > > > > 302. Again, as said above replica-assignment in alter-topic
> > > > doesn't
> > > > > > > > change
> > > > > > > > > replica assignment of the existing partitions. But we can
> > think
> > > > > about
> > > > > > > it
> > > > > > > > > in general -
> > > > > > > > > how can we change topic replication factor? The easy way -
> we
> > > > don't
> > > > > > > need
> > > > > > > > > it,
> > > > > > > > > we can use reassign partitions. Not sure whether we want to
> > add
> > > > > > special
> > > > > > > > > logic
> > > > > > > > > to treat this case...
> > > > > > > > >
> > > > > > > > > 303.1. Okay, sure, I'll generalize topicExists().
> > > > > > > > > 303.2. I think, yes, we need separate verify methods as a
> > > status
> > > > > > check
> > > > > > > > > procedure,
> > > > > > > > > because respective requests are long running, and CLI user
> > > > > > potentially
> > > > > > > > > will
> > > > > > > > > asynchronously call reassign-partitions, do other stuff
> (e.g.
> > > > > create
> > > > > > > > > topics) periodically
> > > > > > > > > checking status of the partition reassignment. Anyway we'll
> > > have
> > > > to
> > > > > > > > > implement this logic
> > > > > > > > > because it's a criterion of the completed Future of the
> > > reassign
> > > > > > > > > partitions async
> > > > > > > > > call, we'll to have make those methods just public.
> > > > > > > > > 303.3. If preferredReplica returns Future<Map<String,
> > Errors>>
> > > > than
> > > > > > > what
> > > > > > > > > is an error
> > > > > > > > > in terms of preferred replica leader election? As I
> > understand
> > > we
> > > > > can
> > > > > > > > only
> > > > > > > > > check
> > > > > > > > > whether it has succeeded (leader == AR.head)  or not _yet_.
> > > > > > > > >
> > > > > > > > > 304.1. Sure, let's add timeout to reassign/preferred
> replica.
> > > > > > > > > 304.2. This can be finalized after we discuss 300.
> > > > > > > > >
> > > > > > > > > 305. Misprints - thanks, fixed.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Andrii Biletskyi
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to