Thanks, Ewen. I think the KIP is clear enough about the intent and the changed behavior.
On Tue, Dec 12, 2017 at 12:22 AM, Ewen Cheslack-Postava <[email protected]> wrote: > And to clarify a bit further: the goal is for both standalone and > distributed mode to display the same basic information. This hasn't > *strictly* been required before because standalone had no worker-level > interaction with the cluster (configs stored in memory, offsets on disk, > and statuses in memory). However, we've always *expected* that a reasonable > configuration was available for the worker and that any overrides were just > that -- customizations on top of the existing config. Although it could > have been *possible* to leave an invalid config for the worker yet provide > valid configs for producers and consumers, this was never the intent. > > Therefore, the argument here is that we *should* be able to rely on a valid > config to connect to the Kafka cluster, whether in standalone or > distributed mode. There should always be a valid "fallback" even if > overrides are provided. We haven't been explicit about this before, but > unless someone objects, I don't think it is unreasonable. > > Happy to update the KIP w/ these details if someone feels they would be > valuable. > > -Ewen > > On Mon, Dec 11, 2017 at 8:21 PM, Ewen Cheslack-Postava <[email protected]> > wrote: > > > > > On Mon, Dec 11, 2017 at 4:01 PM, Gwen Shapira <[email protected]> wrote: > > > >> Thanks, Ewen :) > >> > >> One thing that wasn't clear to me from the wiki: Will standalone connect > >> also have a Kafka cluster ID? While it is true that only tasks have > >> producers and consumers, I think we assumed that all tasks on one > >> stand-alone will use one Kafka cluster? > >> > > > > Yeah, maybe not clear enough in the KIP, but this is what I was getting > at > > -- while I think it's possible to use different clusters for worker, > > producer, and consumer, I don't think this is really expected or a use > case > > worth bending backwards to support perfectly. In standalone mode, > > technically a value is not required because a default is included and we > > only utilize the value currently for the producers/consumers in tasks. > But > > I don't think it is unreasonable to require a valid setting at the worker > > level, even if you override the bootstrap.servers for producer and > consumer. > > > > > >> > >> Another suggestion is not to block the REST API on the connection, but > >> rather not return the cluster ID until we know it (return null instead). > >> So > >> clients will need to poll rather than block. Not sure this is better, > but > >> you didn't really discuss this, so wanted to raise the option. > >> > > > > It's mentioned briefly in https://cwiki.apache.org/ > > confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+ > > ID+in+Connect+REST+API#KIP-238:ExposeKafkaclusterIDinConnectR > > ESTAPI-ProposedChanges I think the tradeoff of blocking the server from > > being "started" until we can at least make one request to the cluster > isn't > > unreasonable since if you can't do that, you're not going to be able to > do > > any useful work anyway. Anyone who might otherwise be using this endpoint > > to monitor health (which it is useful for since it doesn't require any > > other external services to be running just to give a response) can just > > interpret connection refused or timeouts as an unhealthy state, as they > > should anyway. > > > > -Ewen > > > > > >> > >> Gwen > >> > >> > >> On Mon, Dec 11, 2017 at 3:42 PM Ewen Cheslack-Postava < > [email protected]> > >> wrote: > >> > >> > I'd like to start discussion on a simple KIP to expose Kafka cluster > ID > >> > info in the Connect REST API: > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-238% > >> 3A+Expose+Kafka+cluster+ID+in+Connect+REST+API > >> > > >> > Hopefully straightforward, though there are some details on how this > >> > affects startup behavior that might warrant discussion. > >> > > >> > -Ewen > >> > > >> > > > > >
