And to clarify a bit further: the goal is for both standalone and distributed mode to display the same basic information. This hasn't *strictly* been required before because standalone had no worker-level interaction with the cluster (configs stored in memory, offsets on disk, and statuses in memory). However, we've always *expected* that a reasonable configuration was available for the worker and that any overrides were just that -- customizations on top of the existing config. Although it could have been *possible* to leave an invalid config for the worker yet provide valid configs for producers and consumers, this was never the intent.
Therefore, the argument here is that we *should* be able to rely on a valid config to connect to the Kafka cluster, whether in standalone or distributed mode. There should always be a valid "fallback" even if overrides are provided. We haven't been explicit about this before, but unless someone objects, I don't think it is unreasonable. Happy to update the KIP w/ these details if someone feels they would be valuable. -Ewen On Mon, Dec 11, 2017 at 8:21 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > > On Mon, Dec 11, 2017 at 4:01 PM, Gwen Shapira <g...@confluent.io> wrote: > >> Thanks, Ewen :) >> >> One thing that wasn't clear to me from the wiki: Will standalone connect >> also have a Kafka cluster ID? While it is true that only tasks have >> producers and consumers, I think we assumed that all tasks on one >> stand-alone will use one Kafka cluster? >> > > Yeah, maybe not clear enough in the KIP, but this is what I was getting at > -- while I think it's possible to use different clusters for worker, > producer, and consumer, I don't think this is really expected or a use case > worth bending backwards to support perfectly. In standalone mode, > technically a value is not required because a default is included and we > only utilize the value currently for the producers/consumers in tasks. But > I don't think it is unreasonable to require a valid setting at the worker > level, even if you override the bootstrap.servers for producer and consumer. > > >> >> Another suggestion is not to block the REST API on the connection, but >> rather not return the cluster ID until we know it (return null instead). >> So >> clients will need to poll rather than block. Not sure this is better, but >> you didn't really discuss this, so wanted to raise the option. >> > > It's mentioned briefly in https://cwiki.apache.org/ > confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+ > ID+in+Connect+REST+API#KIP-238:ExposeKafkaclusterIDinConnectR > ESTAPI-ProposedChanges I think the tradeoff of blocking the server from > being "started" until we can at least make one request to the cluster isn't > unreasonable since if you can't do that, you're not going to be able to do > any useful work anyway. Anyone who might otherwise be using this endpoint > to monitor health (which it is useful for since it doesn't require any > other external services to be running just to give a response) can just > interpret connection refused or timeouts as an unhealthy state, as they > should anyway. > > -Ewen > > >> >> Gwen >> >> >> On Mon, Dec 11, 2017 at 3:42 PM Ewen Cheslack-Postava <e...@confluent.io> >> wrote: >> >> > I'd like to start discussion on a simple KIP to expose Kafka cluster ID >> > info in the Connect REST API: >> > >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-238% >> 3A+Expose+Kafka+cluster+ID+in+Connect+REST+API >> > >> > Hopefully straightforward, though there are some details on how this >> > affects startup behavior that might warrant discussion. >> > >> > -Ewen >> > >> > >