Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Colin McCabe Mon, 06 Feb 2017 11:41:32 -0800

On Fri, Feb 3, 2017, at 16:57, Dong Lin wrote:
> Thanks for the reply, Colin. I have some comments inline.


Hi Dong L.,

> 
> In addition, I also have some comments regarding the Future() in response
> to your latest email. As Ismael mentioned, we have added
> purgeDataBefore()
> API in AdminClient. This API returns Future() that allows user to purge
> data in either syn or async manner. And we have presented use-case for
> both
> syn and async usage of this API in the discussion thread of KIP-107. I
> think we should at least return a Future() object in this case, right?

Hmm.  I didn't see any discussion in the KIP-107 [DISCUSS] thread of why
a Future<> based API was proposed.  It seemed like the discussion
focused on other issues (or maybe I missed the an email thread?)

> 
> As you mentioned, we can transform a blocking API into a Futures-based
> API
> by using a thread pool. But thread pool seems inconvenient as compared to
> using future().get() which transform a Future-based API into a blocking
> API. Would it be a good reason to return Future() for all those API where
> we need both syn and async mode?

That's a good point.  It's easier to build a sync API on top of Futures
than the reverse.

> 
> 
> On Fri, Feb 3, 2017 at 10:20 AM, Colin McCabe <cmcc...@apache.org> wrote:
> 
> > On Thu, Feb 2, 2017, at 17:54, Dong Lin wrote:
> > > Hey Colin,
> > >
> > > Thanks for the KIP. I have a few comments below:
> > >
> > > - I share similar view with Ismael that a Future-based API is better.
> > > PurgeDataFrom() is an example API that user may want to do it
> > > asynchronously even though there is only one request in flight at a time.
> > > In the future we may also have some admin operation that allows user to
> > > move replica from one broker to another, which also needs to work in both
> > > sync and async style. It seems more generic to return Future<?> for any
> > > API
> > > that requires both mode.
> > >
> > > - I am not sure if it is the cleanest way to have enum classes
> > > CreateTopicsFlags and DeleteTopicsFlags. Are we going to create such
> > > class
> > > for every future API that requires configuration? It may be more generic
> > > to
> > > provide Map<String, ?> to any admin API that operates on multiple
> > > entries.
> > > For example, deleteTopic(Map<String, Boolean>). And it can be Map<String,
> > > Properties> for those API that requires multiple configs per entry. And
> > > we
> > > can provide default value, doc, config name for those API as we do
> > > with AbstractConfig.
> >
> > Thanks for the comments, Dong L.
> >
> > EnumSet<CreateTopicsFlags>, EnumSet<DeleteTopicsFlags>, and so forth are
> > type-safe ways for the user to pass a set of boolean flags to the
> > function.  It is basically a replacement for having an api like
> > createTopics(...., boolean nonblocking, boolean validateOnly).  It is
> > preferrable to having the boolean flags because it's immediately clear
> > when reading the code what createTopics(...,
> > CreateTopicsFlags.VALIDATE_ONLY) means, whereas it is not immediately
> > clear what createTopics(..., false, true) means.  It also prevents
> > having lots of function overloads over time, which becomes confusing for
> > users.  The EnumSet is not intended as a replacement for all possible
> > future arguments we might add, but just an easy way to add more boolean
> > arguments later without adding messy function overloads or type
> > signatures with many booleans.
> >
> > Map<String, ?> is not type-safe, and I don't think we should use it in
> > our APIs.  Instead, we should just add function overloads if necessary.
> >
> 
> I agree that using EnumSet is type safe.
> 
> 
> >
> > >
> > > - I not sure if "Try" is very intuitive to Java developer. Is there any
> > > counterpart of scala's "Try" in java
> >
> > Unfortunately, there is no equivalent to Try in the standard Java
> > library.  That is the first place I checked, and I spent a while
> > searching.  The closest is probably Java 8's Optional.  However,
> > Optional just allows us to express Some(thing) or None, so callers would
> > not be able to determine what the error was.
> 
> 
> > > We actually have quite a few
> > > existing
> > > classes in Kafka that address the same problem, such as
> > > ProduceRequestResult, LogAppendResult etc. Maybe we can follow the same
> > > conversion and use *Result as this class name.
> >
> > Hmm.  ProduceRequestResult and LogAppendResult just store an exception
> > alongside the data, and make the caller responsible for checking whether
> > the exception is null (or None) before looking at the data.  I don't
> > think this is a good solution, because if the user forgets to do this,
> > they will interpret whatever is in the data fields as valid, even when
> > it is not.  ProduceRequestResult and LogAppendResult are also internal
> > classes, not user-facing APIs, so we did not spend time thinking about
> > how to make them easy to use for end-users.
> >
> 
> I am not actually suggesting to use LogAppendResult directly. I am
> wondering if it would be more intuitive for developer to name this class
> something like OperationResult instead of Try. The OperationResult can
> store whatever we want to store with current "Try" class you proposed in
> the KIP. It is just my personal opinion that "Try" doesn't directly tell
> me
> what the class actually does. But I am not too worried about it if you
> think "Try" is better.

I guess part of the reason I used "Try" is that scala.util.Try [
http://www.scala-lang.org/api/2.9.3/scala/util/Try.html ] is very
well-known to most Scala developers and fills the same function.  I'm
open to other names... "Result" might be better than "OperationResult",
since it's shorter.

> 
> 
> >
> > >
> > > - How are we going to allow user to specify timeout for blocking APIs
> > > such
> > > as deleteTopic? Is this configured per AdminClient, or should it be
> > > specified in the API parameter?
> >
> > Right now, it is specified by the configuration of the AdminClient.
> >
> 
> User probably wants different client-side timeout for different API,
> right?
> For example, user may want to block for a long time for topic creation
> API
> because he needs topic to be created before doing the next thing. But the
> user may want to call purgeDataBefore() API periodically and
> automatically
> without waiting for purge operation to complete. So if we don't provide
> Future() to user, we should probably allow user to specify client-side
> timeout in the API.

Keep in mind "async on the client" (futures) is different than "async on
the server" (setting very short or 0 timeouts in the RPC).  We could
have a future-based API and still have long server timeouts.  It just
means that the futures will wait longer before returning failure or
success.  I agree that it would be nice to be able to set per-operation
timeouts, though.

cheers,
Colin

> 
> 
> >
> > >
> > > - Are we going to have this class initiate its own thread, as we do with
> > > Sender class, to send/receive requests? If yes, it will be useful to have
> > > have a class that extends AbstractConfig and specifies config and their
> > > default values.
> >
> > Yes, I agree.  I will add this to the KIP.
> >
> > best,
> > Colin
> >
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > > On Thu, Feb 2, 2017 at 3:02 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> > >
> > > > Hi Colin,
> > > >
> > > > Thanks for the KIP, great to make progress on this. I have some initial
> > > > comments, will post more later:
> > > >
> > > > 1. We have KafkaProducer that implements the Producer interface and
> > > > KafkaConsumer that implements the Consumer interface. Maybe we could
> > have
> > > > KafkaAdminClient that implements the AdminClient interface? Or maybe
> > just
> > > > KafkaAdmin. Not sure, some ideas for consideration. Also, I don't
> > think we
> > > > should worry about a name clash with the internal AdminClient written
> > in
> > > > Scala. That will go away soon enough and choosing a good name for the
> > > > public class is what matters.
> > > >
> > > > 2. We should include the proposed package name in the KIP
> > > > (probably org.apache.kafka.clients.admin?).
> > > >
> > > > 3. It would be good to list the supported configs.
> > > >
> > > > 4. KIP-107, which passed the vote, specifies the introduction of a
> > method
> > > > to AdminClient with the following signature. We should figure out how
> > it
> > > > would look given this proposal.
> > > >
> > > > Future<Map<TopicPartition, PurgeDataResult>>
> > > > purgeDataBefore(Map<TopicPartition, Long> offsetForPartition)
> > > >
> > > > 5. I am not sure about rejecting the Futures-based API. I think I would
> > > > prefer that, personally. Grant had an interesting idea of not exposing
> > the
> > > > batch methods in the AdminClient to start with to keep it simple and
> > > > relying on a Future based API to make it easy for users to run things
> > > > concurrently. This is consistent with the producer and Java 8 makes
> > things
> > > > a lot nicer with CompletableFuture (similar to Scala Futures). I will
> > think
> > > > more about this and other details of the proposal and send a follow-up.
> > > >
> > > > Ismael
> > > >
> > > > On Thu, Feb 2, 2017 at 6:54 PM, Colin McCabe <cmcc...@apache.org>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I wrote up a Kafka improvement proposal for adding an
> > > > > AdministrativeClient interface.  This is a continuation of the work
> > on
> > > > > KIP-4 towards centralized administrative operations.  Please check
> > out
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%
> > > > 3A+Add+a+public+
> > > > > AdministrativeClient+API+for+Kafka+admin+operations
> > > > >
> > > > > regards,
> > > > > Colin
> > > > >
> > > >
> >

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

Reply via email to