And to clarify a bit further: the goal is for both standalone and
distributed mode to display the same basic information. This hasn't
*strictly* been required before because standalone had no worker-level
interaction with the cluster (configs stored in memory, offsets on disk,
and statuses in memory). However, we've always *expected* that a reasonable
configuration was available for the worker and that any overrides were just
that -- customizations on top of the existing config. Although it could
have been *possible* to leave an invalid config for the worker yet provide
valid configs for producers and consumers, this was never the intent.

Therefore, the argument here is that we *should* be able to rely on a valid
config to connect to the Kafka cluster, whether in standalone or
distributed mode. There should always be a valid "fallback" even if
overrides are provided. We haven't been explicit about this before, but
unless someone objects, I don't think it is unreasonable.

Happy to update the KIP w/ these details if someone feels they would be
valuable.

-Ewen

On Mon, Dec 11, 2017 at 8:21 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

>
> On Mon, Dec 11, 2017 at 4:01 PM, Gwen Shapira <g...@confluent.io> wrote:
>
>> Thanks, Ewen :)
>>
>> One thing that wasn't clear to me from the wiki: Will standalone connect
>> also have a Kafka cluster ID? While it is true that only tasks have
>> producers and consumers, I think we assumed that all tasks on one
>> stand-alone will use one Kafka cluster?
>>
>
> Yeah, maybe not clear enough in the KIP, but this is what I was getting at
> -- while I think it's possible to use different clusters for worker,
> producer, and consumer, I don't think this is really expected or a use case
> worth bending backwards to support perfectly. In standalone mode,
> technically a value is not required because a default is included and we
> only utilize the value currently for the producers/consumers in tasks. But
> I don't think it is unreasonable to require a valid setting at the worker
> level, even if you override the bootstrap.servers for producer and consumer.
>
>
>>
>> Another suggestion is not to block the REST API on the connection, but
>> rather not return the cluster ID until we know it (return null instead).
>> So
>> clients will need to poll rather than block. Not sure this is better, but
>> you didn't really discuss this, so wanted to raise the option.
>>
>
> It's mentioned briefly in https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+
> ID+in+Connect+REST+API#KIP-238:ExposeKafkaclusterIDinConnectR
> ESTAPI-ProposedChanges I think the tradeoff of blocking the server from
> being "started" until we can at least make one request to the cluster isn't
> unreasonable since if you can't do that, you're not going to be able to do
> any useful work anyway. Anyone who might otherwise be using this endpoint
> to monitor health (which it is useful for since it doesn't require any
> other external services to be running just to give a response) can just
> interpret connection refused or timeouts as an unhealthy state, as they
> should anyway.
>
> -Ewen
>
>
>>
>> Gwen
>>
>>
>> On Mon, Dec 11, 2017 at 3:42 PM Ewen Cheslack-Postava <e...@confluent.io>
>> wrote:
>>
>> > I'd like to start discussion on a simple KIP to expose Kafka cluster ID
>> > info in the Connect REST API:
>> >
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-238%
>> 3A+Expose+Kafka+cluster+ID+in+Connect+REST+API
>> >
>> > Hopefully straightforward, though there are some details on how this
>> > affects startup behavior that might warrant discussion.
>> >
>> > -Ewen
>> >
>>
>
>

Reply via email to