I didn't get anyone in attendance for this meeting. If you would like to discuss it please let me know.
Thank you, Grant On Mon, Mar 28, 2016 at 9:18 AM, Grant Henke <ghe...@cloudera.com> wrote: > I am hoping to get more discussion and feedback around the blocking vs > async discussion so I can start to get KIP-4 patches reviewed. > > In order to facilitate a faster discussion I will hold an open discussion > on Tuesday March 29th at 12pm PST (right after the usual KIP call, if we > have one). Please join via the hangouts link below: > > - https://plus.google.com/hangouts/_/cloudera.com/discuss-kip-4 > > If you can't make that time, please suggest another time you would like to > meet and I can hold another meeting too. I will take notes of the meetings > and update here. > > Thank you, > Grant > > On Tue, Mar 15, 2016 at 9:49 AM, Grant Henke <ghe...@cloudera.com> wrote: > >> Moving the relevant wiki text here for discussion/tracking: >>> >>> Server-side Admin Request handlers >>> >>> At the highest level, admin requests will be handled on the brokers the >>> same way that all message types are. However, because admin messages modify >>> cluster metadata they should be handled by the controller. This allows the >>> controller to propagate the changes to the rest of the cluster. However, >>> because the messages need to be handled by the controller does not >>> necessarily mean they need to be sent directly to the controller. A message >>> forwarding mechanism can be used to forward the message from any broker to >>> the correct broker for handling. >>> >>> Because supporting all of this is quite the undertaking I will describe >>> the "ideal functionality" and then the "intermediate functionality" that >>> gets us some basic administrative support quickly while working towards the >>> optimal state. >>> >>> *Ideal Functionality:* >>> >>> 1. A client sends an admin request to *any* broker >>> 2. The admin request is forwarded to the required broker (likely the >>> controller) >>> 3. The request is handled and the server blocks until a timeout is >>> reached or the requested operation is completed (failure or success) >>> 1. An operation is considered complete/successful when *all >>> required nodes have the correct/current state*. >>> 2. Immediate follow up requests to *any broker* will succeed. >>> 3. Requests that timeout may still be completed after the >>> timeout. The users would need to poll to check the state. >>> 4. The response is generated and forwarded back to the broker that >>> received the request. >>> 5. A response is sent back to the client. >>> >>> *Intermediate Functionality*: >>> >>> 1. A client sends an admin request to *the controller* broker >>> 1. As a follow up request forwarding can be added transparently. >>> (see below) >>> 2. The request is handled and the server blocks until a timeout is >>> reached or the requested operation is completed (failure or success) >>> 1. An operation is considered complete/successful when *the >>> controller node has the correct/current state.* >>> 2. Immediate follow up requests to *the controller* will succeed. >>> Others (not to the controller) are likely to succeed or cause a >>> retriable >>> exception that would eventually succeed. >>> 3. Requests that timeout may still be completed after the >>> timeout. The users would need to poll to check the state. >>> 3. A response is sent back to the client. >>> >>> The ideal functionality has 2 features that are more challenging >>> initially. For that reason those features will be removed from the initial >>> changes, but will be tracked as follow up improvements. However, this >>> intermediate solution should allow for a relatively transparent transition >>> to the ideal functionality. >>> >>> *Request Forwarding: KAFKA-1912 >>> <https://issues.apache.org/jira/browse/KAFKA-1912>* >>> >>> Request forwarding is relevant to any message the needs to be sent to >>> the "correct" broker (ex: partition leader, group coordinator, etc). Though >>> at first it may seam simple it has many technicall challenges that need to >>> be decided in regards to connections, failure, retries, etc. Today, we >>> depend on the client to choose the correct broker and clients that want to >>> utilize the cluster "optimally" would likely continue to do so. For >>> those reasons it can be handled it can be handled generically as an >>> independent feature. >>> >>> *Cluster Consistent Blocking:* >>> >>> Blocking an admin request until the entire cluster is aware of the >>> correct/current state is difficult based on Kafka's current approach for >>> propagating metadata. This approach varies based on the the metadata >>> changing. >>> >>> - Topic metadata changes are propagated via UpdateMetadata and >>> LeaderAndIsr requests >>> - Config changes are propagated via zookeeper and listeners >>> - ACL changes depend on the implementation of the Authorizer >>> interface >>> - The default SimpleACLAuthorizer uses zookeeper and listeners >>> >>> Though all of these mechanisms are different, they are all commonly >>> "eventually consistent". None of the mechanisms, as currently implemented, >>> will block until the metadata has been propagated successfully. Changing >>> this behavior would require a large amount of change to the >>> KafkaController, additional inter-broker messages, and potentially a change >>> to the Authorizer interface. These are are all changes that should not >>> block the implementation of KIP-4. >>> >>> The intermediate changes in KIP-4 should allow an easy transition to >>> "complete blocking" when the work can be done. This is supported by >>> providing *optional* local blocking in the mean time. This local >>> blocking only blocks until the local state on the controller is correct. We >>> will still provide a polling mechanism for users that do not want to block >>> at all. A polling mechanism is required in the optimal implementation too >>> because users still need a way to check state after a timeout occurs >>> because operations like "create topic" are not transactional. Local >>> blocking has the added benefit of avoiding wasted poll requests to other >>> brokers when its impossible for the request to be completed. If the >>> controllers state is not correct, then the other brokers cant be either. >>> Clients who don't want to validate the entire cluster state is correct can >>> block on the controller and avoid polling all together with reasonable >>> confidence that though they may get a retriable error on follow up >>> requests, the requested change was successful and the cluster will be >>> accurate eventually. >>> >>> Because we already add a timeout field to the requests wire protocols, >>> changing the behavior to block until the cluster is consistent in the >>> future would not require a protocol change. Though the version could be >>> bumped to indicate a behavior change. >>> >> >> Thanks, >> Grant >> >> >> On Mon, Mar 14, 2016 at 5:07 PM, Grant Henke <ghe...@cloudera.com> wrote: >> >>> I have been updating the KIP-4 wiki page based on the last KIP call and >>> wanted to get some review and discussion around the server side >>> implementation for admin requests. Both the "ideal" functionality and the >>> "intermediated" functionality. The updates are still in progress, but this >>> section is the most critical and will likely have the most discussion. This >>> topic has had a few shifts in perspective and various discussions on >>> synchronous vs asynchronous server support. The wiki contains my current >>> perspective on the challenges and approach. >>> >>> If you have any thoughts or feedback on the "Server-side Admin Request >>> handlers" section here >>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-2.Server-sideAdminRequesthandlers>. >>> Lets discuss them in this thread. >>> >>> For reference the last KIP discussion can be viewed here: >>> https://youtu.be/rFW0-zJqg5I?t=12m30s >>> >>> Thank you, >>> Grant >>> -- >>> Grant Henke >>> Software Engineer | Cloudera >>> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >>> >> >> >> >> -- >> Grant Henke >> Software Engineer | Cloudera >> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >> > > > > -- > Grant Henke > Software Engineer | Cloudera > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke > -- Grant Henke Software Engineer | Cloudera gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke