Re: [DISCUSS] KIP-113: Support replicas movement between log directories

Tom Bentley Thu, 10 Aug 2017 06:57:59 -0700

I've spent some time thinking about KIP-179 and KIP-113, the proposed
algorithms and APIs, and trying to weigh the pros and cons of various
alternative options.


I think Dong's reasons for the algorithm for inter-broker move in KIP-113
make a lot of sense. I don't think it would be at all simple to try to
change that algorithm to one where the whole thing can be triggered by a
single call to an AdminClient method. So I guess we should try to keep as
much of that algorithm as possible.

KIP-179 will need to change this step

 - The script creates reassignment znode in zookeeper.
>

with an AdminClient API call. This call can the same one as currently
specified in KIP-179 -- reassignPartitions() -- except the argument needs
to take into account the need to pass log dirs as well as broker ids. Thus
I would suggest

    ReassignPartitionsResult reassignPartitions(Map<TopicPartition,
List<ReplicAssignment>> assignments)

where:

    class ReplicAssignment {
        int broker()
        String logDirectory()// can be null
    }

(This is just a Java representation of the reassignment json in KIP-113,
which itself is a superset of the reassignment json currently in use)

The corresponding protocol would look like this:

    ReassignPartitionsRequest => timeout validate_only log_dirs
[topic_assignment]
      timout => int32
      validate_only => boolean
      log_dirs => [string]
      topic_assignment => topic [partition_assignment]
        topic => string
        partition_assignment => partition [replica]
          partition => int32
          replica => broker log_dir_index
            broker => int32
            log_dir_index => int16

The purpose of log_dirs is to serialize each log dir in the request only
once. These are then referred to by index in log_dir_index. The
log_dir_index can be -1, which means the caller doesn't care which log
directory should be used on the receiving broker.

This request can be sent to *any* broker. The broker which receives a
ReassignPartitionsRequest essentially converts it into reassignment JSON
and writes that JSON to the znode, then returns a
ReassignPartitionsResponse:

    ReassignPartitionsResponse => throttle_time_ms [topic_assignment_result]
      throttle_time_ms => INT32
          log_dirs => [string]
      topic_assignment_result => topic partition_assignment_result
            topic => STRING
            partition_assignment_result => partition [replica_result]
              partition => int32
              replica_result => broker log_dir_index error_code
error_message
                broker => int32
                log_dir_index => int16
            error_code => INT16
                 error_message => NULLABLE_STRING

This is using the same encoding scheme as wrt log_dirs as described above.

Meanwhile the controller is notified by ZK of the change in value of the
znode and proceeds, as currently, by sending LeaderAndIsrRequest and
StopReplicaRequest in order to complete the reassignments.

The remaining problem is around how to measure progress of reassignment. As
mentioned in the email I wrote this morning, I think we really need two
different lag calculations if we're using the lag to measure progress and
we want the property that lag=0 means reassignment has finished. The
problem with that, I now realise, is the script might be called with
reassignments which are a mix of:

* inter-broker moves without a log dir (=> use HW-replicaLEO)
* inter-broker moves with a log dir (=> use HW-replicaLEO)
* intra-broker moves with a log dir (=> use .log_LEO - .move_LEO)

And if there were two APIs we'd end up needing to make both kinds of query
to each broker in the cluster. This morning I said:

But AFAICS this observation doesn't really help much in terms of the APIs
> concerned though. Since the requests would still need to go to different
> brokers depending on which kind of movement is being performed.
>

But I wonder if that's *really* such a problem: In the case of an
inter-broker move we just need to ask the leader, and in the case of an
intra-broker move we just have to ask that broker. In generally we'd need a
single request to each broker in the cluster. Then each broker would need
to service that request, but presumably it's just pulling a number out of a
ConcurrentHashMap, which is updated by the replica movement code in each of
the two cases (inter-broker and intra-broker). WDYT?

Assuming no one can see any glaring holes in what I'm proposing here, or
wants to suggest a workable alternative set of APIs and algorithms, then
I'll update KIP-179 to this effect.

Thanks for taking the time to read this far!

Tom

On 10 August 2017 at 11:56, Tom Bentley <[email protected]> wrote:

> Hi Dong and Jun,
>
> It seems that KIP-179 does not explicitly specify the definition of this
>> lag.
>
>
> Given that the definition of "caught up" is "is the replica in the ISR?",
> I found the code in Partition.maybeExpandIsr() which decides whether a
> replica should be added to the to the ISR and it uses replica.logEndOffset.
> offsetDiff(leaderHW) >= 0, so for this purpose I would define the lag as
> max(leaderHW - replicaLEO, 0). I think we agree this should work for
> inter-broker movement, where the leader knows these quantities.
>
> As Dong says, this doesn't work for the intra-broker case:
>
> Note that we can not calculate lag as max(0, HW - LEO)
>> because we still need the difference between two lags to measure the
>> progress of intra-broker replica movement.
>>
>
> It seems to me that the intra-broker case is actually a special case of
> the inter-broker case. Conceptually with an intra-broker move the ".log"
> replica is the leader, the ".move" directory is the follower, the ISR is
> the singleton containing the leader, thus the HW if the LEO of the ".log".
> Viewed in this way, Dong's method of leaderLEO - followerLEO is the same
> thing for the intra-broker case as HW-LEO is for the inter-broker case.
>
> But AFAICS this observation doesn't really help much in terms of the APIs
> concerned though. Since the requests would still need to go to different
> brokers depending on which kind of movement is being performed.
>
> So perhaps this is another case where maybe it makes sense to keep the two
> APIs separate, one API for measuring inter-broker movement progress an
> another for the intra-broker case. WDYT?
>
> Thanks for the continuing discussion on this!
>
> Tom
>
>
> On 10 August 2017 at 05:28, Dong Lin <[email protected]> wrote:
>
>> Hey Jun,
>>
>> I have been thinking about whether it is better to return lag (i.e. HW -
>> LEO) instead of LEO. Note that the lag in the DescribeDirsResponse may be
>> negative if LEO > HW. It will almost always be negative for leader and
>> in-sync replicas. Note that we can not calculate lag as max(0, HW - LEO)
>> because we still need the difference between two lags to measure the
>> progress of intra-broker replica movement. The AdminClient API can choose
>> to return max(0, HW - LEO) depending on whether it is used for tracking
>> progress of inter-broker reassignment or intra-broker movement. Is it OK?
>> If so, I will update the KIP-113 accordingly to return lag in the
>> DescribeDirsResponse .
>>
>> Thanks,
>> Dong
>>
>>
>>
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source
>> =link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
>> Virus-free.
>> www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source
>> =link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Wed, Aug 9, 2017 at 5:06 PM, Jun Rao <[email protected]> wrote:
>>
>> > Hi, Dong,
>> >
>> > Yes, the lag in a replica is calculated as the difference of LEO of the
>> > replica and the HW. So, as long as a replica is in sync, the lag is
>> always
>> > 0.
>> >
>> > So, I was suggesting to return lag instead of LEO in
>> DescribeDirsResponse
>> > for each replica. I am not sure if we need to return HW though.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Wed, Aug 9, 2017 at 5:01 PM, Dong Lin <[email protected]> wrote:
>> >
>> > > Hey Jun,
>> > >
>> > > It just came to me that you may be assuming that folower_lag = HW -
>> > > follower_LEO. If that is the case, then we need to have new
>> > > request/response to retrieve this lag since the DescribeDirsResponse
>> > > doesn't even include HW. It seems that KIP-179 does not explicitly
>> > specify
>> > > the definition of this lag.
>> > >
>> > > I have been assuming that follow_lag = leader_LEO - follower_LEO given
>> > that
>> > > the request is used to query the reassignment status. Strictly
>> speaking
>> > the
>> > > difference between leader_LEO and the HW is limited by the amount of
>> data
>> > > produced in KafkaConfig.replicaLagTimeMaxMs, which is 10 seconds. I
>> also
>> > > assumed that 10 seconds is probably not a big deal given the typical
>> time
>> > > length of the reassignment.
>> > >
>> > > Thanks,
>> > > Dong
>> > >
>> > > On Wed, Aug 9, 2017 at 4:40 PM, Dong Lin <[email protected]> wrote:
>> > >
>> > > > Hey Jun,
>> > > >
>> > > > If I understand you right, you are suggesting that, in the case when
>> > > there
>> > > > is continuous incoming traffic, the approach in the KIP-179 will
>> report
>> > > lag
>> > > > as 0 whereas the approach using DescribeDirsRequest will report lag
>> as
>> > > > non-zero. But I think the approach in KIP-179 will also report
>> non-zero
>> > > lag
>> > > > when there is continuous traffic. This is because at the time the
>> > leader
>> > > > receives ReplicaStatusRequest, it is likely that some data has been
>> > > > appended to the partition after the last FetchRequest from the
>> > follower.
>> > > > Does this make sense?
>> > > >
>> > > > Thanks,
>> > > > Dong
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Aug 9, 2017 at 4:24 PM, Jun Rao <[email protected]> wrote:
>> > > >
>> > > >> Hi, Dong,
>> > > >>
>> > > >> As for whether to return LEO or lag, my point was the following.
>> What
>> > > you
>> > > >> are concerned about is that an in-sync replica could become out of
>> > sync
>> > > >> again. However, the more common case is that once a replica is
>> caught
>> > > up,
>> > > >> it will stay in sync afterwards. In that case, once the
>> reassignment
>> > > >> process completes, if we report based on lag, all lags will be 0.
>> If
>> > we
>> > > >> report based on Math.max(0, leaderLEO - followerLEO), the value may
>> > not
>> > > be
>> > > >> 0 if there is continuous incoming traffic, which will be confusing
>> to
>> > > the
>> > > >> user.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Jun
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Tue, Aug 8, 2017 at 6:26 PM, Dong Lin <[email protected]>
>> wrote:
>> > > >>
>> > > >> > Hey Jun,
>> > > >> >
>> > > >> > Thanks for the comment!
>> > > >> >
>> > > >> > Yes, it should work. The tool can send request to any broker and
>> > > broker
>> > > >> can
>> > > >> > just write the reassignment znode. My previous intuition is that
>> it
>> > > may
>> > > >> be
>> > > >> > better to only send this request to controller. But I don't have
>> > good
>> > > >> > reasons for this restriction.
>> > > >> >
>> > > >> > My intuition is that we can keep them separate as well. Becket
>> and I
>> > > >> have
>> > > >> > discussed this both offline and in https://github.com/apache/
>> > > >> > kafka/pull/3621.
>> > > >> > Currently I don't have a strong opinion on this and I am open to
>> > using
>> > > >> only
>> > > >> > one API to do both if someone can come up with a reasonable API
>> > > >> signature
>> > > >> > for this method. For now I have added the method
>> alterReplicaDir()
>> > in
>> > > >> > KafkaAdminClient instead of the AdminClient interface so that the
>> > > >> > reassignment script can use this method without concluding what
>> the
>> > > API
>> > > >> > would look like in AdminClient in the future.
>> > > >> >
>> > > >> > Regarding DescribeDirsResponse, I think it is probably OK to have
>> > > >> slightly
>> > > >> > more lag. The script can calculate the lag of the follower
>> replica
>> > as
>> > > >> > Math.max(0, leaderLEO - followerLEO). I agree that it will be
>> > slightly
>> > > >> less
>> > > >> > accurate than the current approach in KIP-179. But even with the
>> > > current
>> > > >> > approach in KIP-179, the result provided by the script is an
>> > > >> approximation
>> > > >> > anyway, since there is delay from the time that leader returns
>> > > response
>> > > >> to
>> > > >> > the time that the script collects response from all brokers and
>> > prints
>> > > >> > result to user. I think if the slight difference in the accuracy
>> > > between
>> > > >> > the two approaches does not make a difference to the intended
>> > use-case
>> > > >> of
>> > > >> > this API, then we probably want to re-use the exiting
>> > request/response
>> > > >> to
>> > > >> > keep the protocol simple.
>> > > >> >
>> > > >> > Thanks,
>> > > >> > Dong
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > On Tue, Aug 8, 2017 at 5:56 PM, Jun Rao <[email protected]>
>> wrote:
>> > > >> >
>> > > >> > > Hi, Dong,
>> > > >> > >
>> > > >> > > I think Tom was suggesting to have the AlterTopicsRequest sent
>> to
>> > > any
>> > > >> > > broker, which just writes the reassignment json to ZK. The
>> > > controller
>> > > >> > will
>> > > >> > > pick up the reassignment and act on it as usual. This should
>> work,
>> > > >> right?
>> > > >> > >
>> > > >> > > Having a separate AlterTopicsRequest and AlterReplicaDirRequest
>> > > seems
>> > > >> > > simpler to me. The former is handled by the controller and the
>> > > latter
>> > > >> is
>> > > >> > > handled by the affected broker. They don't always have to be
>> done
>> > > >> > together.
>> > > >> > > Merging the two into a single request probably will make both
>> the
>> > > api
>> > > >> and
>> > > >> > > the implementation a bit more complicated. If we do keep the
>> two
>> > > >> separate
>> > > >> > > requests, it seems that we should just add
>> AlterReplicaDirRequest
>> > to
>> > > >> the
>> > > >> > > AdminClient interface?
>> > > >> > >
>> > > >> > > Now, regarding DescribeDirsResponse. I agree that it can be
>> used
>> > for
>> > > >> the
>> > > >> > > status reporting in KIP-179 as well. However, it seems that
>> > > reporting
>> > > >> the
>> > > >> > > log end offset of each replica may not be easy to use. The log
>> end
>> > > >> offset
>> > > >> > > will be returned from different brokers in slightly different
>> > time.
>> > > If
>> > > >> > > there is continuous producing traffic, the difference in log
>> end
>> > > >> offset
>> > > >> > > between the leader and the follower could be larger than 0
>> even if
>> > > the
>> > > >> > > follower has fully caught up. I am wondering if it's better to
>> > > instead
>> > > >> > > return the lag in offset per replica. This way, the status can
>> > > >> probably
>> > > >> > be
>> > > >> > > reported more reliably.
>> > > >> > >
>> > > >> > > Thanks,
>> > > >> > >
>> > > >> > > Jun
>> > > >> > >
>> > > >> > > On Tue, Aug 8, 2017 at 11:23 AM, Dong Lin <[email protected]
>> >
>> > > >> wrote:
>> > > >> > >
>> > > >> > > > Hey Tom,
>> > > >> > > >
>> > > >> > > > Thanks for the quick reply. Please see my comment inline.
>> > > >> > > >
>> > > >> > > > On Tue, Aug 8, 2017 at 11:06 AM, Tom Bentley <
>> > > [email protected]
>> > > >> >
>> > > >> > > > wrote:
>> > > >> > > >
>> > > >> > > > > Hi Dong,
>> > > >> > > > >
>> > > >> > > > > Replies inline, as usual
>> > > >> > > > >
>> > > >> > > > > > As I originally envisaged it, KIP-179's support for
>> > > reassigning
>> > > >> > > > > partitions
>> > > >> > > > > >
>> > > >> > > > > > would have more-or-less taken the logic currently in the
>> > > >> > > > > > > ReassignPartitionsCommand (that is, writing JSON to the
>> > > >> > > > > > > ZkUtils.ReassignPartitionsPath)
>> > > >> > > > > > > and put it behind a suitable network protocol API.
>> Thus it
>> > > >> > wouldn't
>> > > >> > > > > > matter
>> > > >> > > > > > > which broker received the protocol call: It would be
>> acted
>> > > on
>> > > >> by
>> > > >> > > > > brokers
>> > > >> > > > > > > being notified of the change in the ZK path, just as
>> > > >> currently.
>> > > >> > > This
>> > > >> > > > > > would
>> > > >> > > > > > > have kept the ReassignPartitionsCommand relatively
>> simple,
>> > > as
>> > > >> it
>> > > >> > > > > > currently
>> > > >> > > > > > > is.
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > > > I am not sure I fully understand your proposal. I think
>> you
>> > > are
>> > > >> > > saying
>> > > >> > > > > that
>> > > >> > > > > > any broker can receive and handle the AlterTopicRequest.
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > That's right.
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > > Let's say a
>> > > >> > > > > > non-controller broker received AlterTopicRequest, is this
>> > > broker
>> > > >> > > going
>> > > >> > > > to
>> > > >> > > > > > send LeaderAndIsrRequest to other brokers? Or is this
>> broker
>> > > >> create
>> > > >> > > the
>> > > >> > > > > > reassignment znode in zookeper?
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > Exactly: It's going to write some JSON to the relevant
>> znode.
>> > > >> Other
>> > > >> > > > brokers
>> > > >> > > > > will get notified by zk when the contents of this znode
>> > changes,
>> > > >> and
>> > > >> > do
>> > > >> > > > as
>> > > >> > > > > they do now. This is what the tool/script does now.
>> > > >> > > > >
>> > > >> > > > > I will confess that I don't completely understand the role
>> of
>> > > >> > > > > LeaderAndIsrRequest, since the current code just seems to
>> > write
>> > > to
>> > > >> > the
>> > > >> > > > > znode do get the brokers to do the reassignment. If you
>> could
>> > > >> explain
>> > > >> > > the
>> > > >> > > > > role of LeaderAndIsrRequest that would be great.
>> > > >> > > > >
>> > > >> > > >
>> > > >> > > > Currently only the controller will listen to the reassignment
>> > > znode
>> > > >> and
>> > > >> > > > sends LeaderAndIsrRequest and StopReplicaRequest to brokers
>> in
>> > > >> order to
>> > > >> > > > complete reassignment. Brokers won't need to listen to
>> zookeeper
>> > > for
>> > > >> > any
>> > > >> > > > reassignment -- brokers only reacts to the request from
>> > > controller.
>> > > >> > > > Currently Kafka's design replies a lot on the controller to
>> > keep a
>> > > >> > > > consistent view of who are the leader of partitions and what
>> is
>> > > the
>> > > >> ISR
>> > > >> > > > etc. It will be a pretty drastic change, if not impossible,
>> for
>> > > the
>> > > >> > > script
>> > > >> > > > to reassign partitions without going through controller.
>> > > >> > > >
>> > > >> > > > Thus I think it is likely that your AlterTopicsRequest can
>> only
>> > be
>> > > >> sent
>> > > >> > > to
>> > > >> > > > controller. Then the controller can create the reassignment
>> > znode
>> > > in
>> > > >> > > > zookeeper so that the information is persisted across
>> controller
>> > > >> fail
>> > > >> > > over.
>> > > >> > > > I haven't think through this in detail though.
>> > > >> > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > > I may have missed it. But I couldn't find
>> > > >> > > > > > the explanation of AlterTopicRequest handling in KIP-179.
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > > > You're right, it doesn't go into that much detail. I will
>> fix
>> > > >> that.
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > > >
>> > > >> > > > > > > KIP-113 is obviously seeking to make more radical
>> changes.
>> > > The
>> > > >> > > > > algorithm
>> > > >> > > > > > > described for moving a replica to a particular
>> directory
>> > on
>> > > a
>> > > >> > > > different
>> > > >> > > > > > > broker (
>> > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > >> > > > > > > 113%3A+Support+replicas+movement+between+log+
>> > > >> > directories#KIP-113:
>> > > >> > > > > > > Supportreplicasmovementbetweenlogdirectories-2)
>> > > >> > > > > > > Howtoreassignreplicabetweenlogdirectoriesacrossbrokers
>> > > >> > > > > > > <https://cwiki.apache.org/conf
>> luence/display/KAFKA/KIP-
>> > > >> > > > > > > 113%3A+Support+replicas+movement+between+log+
>> > > >> > directories#KIP-113:
>> > > >> > > > > > > Supportreplicasmovementbetweenlogdirectories-2%
>> > > >> > > > > > > 29Howtoreassignreplicabetweenl
>> > ogdirectoriesacrossbrokers>)
>> > > >> > > > > > > involves both sending AlterReplicaDirRequest to "the"
>> > broker
>> > > >> (the
>> > > >> > > > > > receiving
>> > > >> > > > > > > broker, I assume, but it's not spelled out), _as well
>> as_
>> > > >> writing
>> > > >> > > to
>> > > >> > > > > the
>> > > >> > > > > > ZK
>> > > >> > > > > > > node.
>> > > >> > > > > > >
>> > > >> > > > > > > This assumes the script (ReassignPartitionsCommand) has
>> > > direct
>> > > >> > > access
>> > > >> > > > > to
>> > > >> > > > > > > ZooKeeper, which is what KIP-179 is seeking to
>> deprecate.
>> > It
>> > > >> > seems
>> > > >> > > a
>> > > >> > > > > > waste
>> > > >> > > > > > > of time to put the logic in the script as part of
>> KIP-113,
>> > > >> only
>> > > >> > for
>> > > >> > > > > > KIP-179
>> > > >> > > > > > > to have to move it back to the controller.
>> > > >> > > > > > >
>> > > >> > > > > >
>> > > >> > > > > > I am not sure I understand what you mean by "It seems a
>> > waste
>> > > of
>> > > >> > time
>> > > >> > > > to
>> > > >> > > > > > put the logic in the script as part of KIP-113, only for
>> > > >> KIP-179 to
>> > > >> > > > have
>> > > >> > > > > to
>> > > >> > > > > > move it back to the controller".
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > Sorry, I misunderstood slightly what you were proposing in
>> > > >> KIP-113,
>> > > >> > so
>> > > >> > > > the
>> > > >> > > > > "waste of time" comment isn't quite right, but I'm still
>> not
>> > > >> > convinced
>> > > >> > > > that
>> > > >> > > > > KIP-113+KIP-179 (in its current form) ends with a
>> satisfactory
>> > > >> > result.
>> > > >> > > > >
>> > > >> > > > > Let me elaborate... KIP-113 says that to support
>> reassigning
>> > > >> replica
>> > > >> > > > > between log directories across brokers:
>> > > >> > > > > * ...
>> > > >> > > > > * The script sends AlterReplicaDirRequest to those brokers
>> > which
>> > > >> need
>> > > >> > > to
>> > > >> > > > > move replicas...
>> > > >> > > > > * The script creates reassignment znode in zookeeper.
>> > > >> > > > > * The script retries AlterReplicaDirRequest to those
>> broker...
>> > > >> > > > > * ...
>> > > >> > > > >
>> > > >> > > > > So the ReassignPartitionsCommand still talks to ZK
>> directly,
>> > but
>> > > >> now
>> > > >> > > it's
>> > > >> > > > > bracketed by calls to the AdminClient. KIP-179 could
>> replace
>> > > that
>> > > >> > > talking
>> > > >> > > > > to ZK directly with a new call to the AdminClient. But then
>> > > we've
>> > > >> > got a
>> > > >> > > > > pretty weird API, where we have to make three AdminClient
>> > calls
>> > > >> (two
>> > > >> > of
>> > > >> > > > > them to the same method), to move a replica. I don't really
>> > > >> > understand
>> > > >> > > > why
>> > > >> > > > > the admin client can't present a single API method to
>> achieve
>> > > >> this,
>> > > >> > and
>> > > >> > > > > encapsulate on the server side the careful sequence of
>> events
>> > > >> > necessary
>> > > >> > > > to
>> > > >> > > > > coordinate the movement. I understood this position is what
>> > > Ismael
>> > > >> > was
>> > > >> > > > > advocating when he said it was better to put the logic in
>> the
>> > > >> > > controller
>> > > >> > > > > than spread between the script and the controller. But
>> maybe I
>> > > >> > > > > misunderstood him.
>> > > >> > > > >
>> > > >> > > >
>> > > >> > > > I have some concern with putting this logic in controller
>> which
>> > > can
>> > > >> be
>> > > >> > > > found in my previous email. Before that is addressed, the
>> script
>> > > (or
>> > > >> > > > AdminClient) seems to be the simplest place to have this
>> logic.
>> > > >> > > >
>> > > >> > > > I agree it is better to have a single API to achieve both
>> > > partition
>> > > >> and
>> > > >> > > > replica -> dir assignment. I think it is likely that we will
>> > find
>> > > a
>> > > >> > good
>> > > >> > > > API to do both. I have updated the KIP-113 to remove API
>> > > >> > alterReplicaDir
>> > > >> > > > from AdminClient interface.
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > > I assume that the logic you mentioned is
>> > > >> > > > > > "movement of replica to the specified log directory".
>> This
>> > > logic
>> > > >> > (or
>> > > >> > > > the
>> > > >> > > > > > implementation of this logic) resides mainly in the
>> > > >> > KafkaAdminClient
>> > > >> > > > and
>> > > >> > > > > > broker. The script only needs to parse the json file as
>> > > >> appropriate
>> > > >> > > and
>> > > >> > > > > > call the new API in AdminClient as appropriate. The
>> logic in
>> > > the
>> > > >> > > script
>> > > >> > > > > is
>> > > >> > > > > > therefore not much and can be easily moved to other
>> classes
>> > if
>> > > >> > > needed.
>> > > >> > > > > >
>> > > >> > > > > > Can you clarify why this logic, i.e. movement of replica
>> to
>> > > the
>> > > >> > > > specified
>> > > >> > > > > > log directory, needs to be moved to controller in
>> KIP-179? I
>> > > >> think
>> > > >> > it
>> > > >> > > > can
>> > > >> > > > > > still be done in the script and controller should not
>> need
>> > to
>> > > >> worry
>> > > >> > > > about
>> > > >> > > > > > log directory of any replica.
>> > > >> > > > > >
>> > > >> > > > > > Thanks,
>> > > >> > > > > > Dong
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-113: Support replicas movement between log directories

Reply via email to