Re: KIP-4 Wiki Update

Grant Henke Wed, 30 Mar 2016 08:04:21 -0700

I didn't get anyone in attendance for this meeting. If you would like to
discuss it please let me know.


Thank you,
Grant

On Mon, Mar 28, 2016 at 9:18 AM, Grant Henke <ghe...@cloudera.com> wrote:

> I am hoping to get more discussion and feedback around the blocking vs
> async discussion so I can start to get KIP-4 patches reviewed.
>
> In order to facilitate a faster discussion I will hold an open discussion
> on Tuesday March 29th at 12pm PST (right after the usual KIP call, if we
> have one). Please join via the hangouts link below:
>
>    - https://plus.google.com/hangouts/_/cloudera.com/discuss-kip-4
>
> If you can't make that time, please suggest another time you would like to
> meet and I can hold another meeting too. I will take notes of the meetings
> and update here.
>
> Thank you,
> Grant
>
> On Tue, Mar 15, 2016 at 9:49 AM, Grant Henke <ghe...@cloudera.com> wrote:
>
>> Moving the relevant wiki text here for discussion/tracking:
>>>
>>> Server-side Admin Request handlers
>>>
>>> At the highest level, admin requests will be handled on the brokers the
>>> same way that all message types are. However, because admin messages modify
>>> cluster metadata they should be handled by the controller. This allows the
>>> controller to propagate the changes to the rest of the cluster.  However,
>>> because the messages need to be handled by the controller does not
>>> necessarily mean they need to be sent directly to the controller. A message
>>> forwarding mechanism can be used to forward the message from any broker to
>>> the correct broker for handling.
>>>
>>> Because supporting all of this is quite the undertaking I will describe
>>> the "ideal functionality" and then the "intermediate functionality" that
>>> gets us some basic administrative support quickly while working towards the
>>> optimal state.
>>>
>>> *Ideal Functionality:*
>>>
>>>    1. A client sends an admin request to *any* broker
>>>    2. The admin request is forwarded to the required broker (likely the
>>>    controller)
>>>    3. The request is handled and the server blocks until a timeout is
>>>    reached or the requested operation is completed (failure or success)
>>>       1. An operation is considered complete/successful when *all
>>>       required nodes have the correct/current state*.
>>>       2. Immediate follow up requests to *any broker* will succeed.
>>>       3. Requests that timeout may still be completed after the
>>>       timeout. The users would need to poll to check the state.
>>>    4. The response is generated and forwarded back to the broker that
>>>    received the request.
>>>    5. A response is sent back to the client.
>>>
>>> *Intermediate Functionality*:
>>>
>>>    1. A client sends an admin request to *the controller* broker
>>>       1. As a follow up request forwarding can be added transparently.
>>>       (see below)
>>>    2. The request is handled and the server blocks until a timeout is
>>>    reached or the requested operation is completed (failure or success)
>>>       1. An operation is considered complete/successful when *the
>>>       controller node has the correct/current state.*
>>>       2. Immediate follow up requests to *the controller* will succeed.
>>>       Others (not to the controller) are likely to succeed or cause a 
>>> retriable
>>>       exception that would eventually succeed.
>>>       3. Requests that timeout may still be completed after the
>>>       timeout. The users would need to poll to check the state.
>>>    3. A response is sent back to the client.
>>>
>>> The ideal functionality has 2 features that are more challenging
>>> initially. For that reason those features will be removed from the initial
>>> changes, but will be tracked as follow up improvements. However, this
>>> intermediate solution should allow for a relatively transparent  transition
>>> to the ideal functionality.
>>>
>>> *Request Forwarding: KAFKA-1912
>>> <https://issues.apache.org/jira/browse/KAFKA-1912>*
>>>
>>> Request forwarding is relevant to any message the needs to be sent to
>>> the "correct" broker (ex: partition leader, group coordinator, etc). Though
>>> at first it may seam simple it has many technicall challenges that need to
>>> be decided in regards to connections, failure, retries, etc. Today, we
>>> depend on the client to choose the correct broker and clients that want to
>>> utilize the cluster "optimally" would likely continue to do so. For
>>> those reasons it can be handled it can be handled generically as an
>>> independent feature.
>>>
>>> *Cluster Consistent Blocking:*
>>>
>>> Blocking an admin request until the entire cluster is aware of the
>>> correct/current state is difficult based on Kafka's current approach for
>>> propagating metadata. This approach varies based on the the metadata
>>> changing.
>>>
>>>    - Topic metadata changes are propagated via UpdateMetadata and
>>>    LeaderAndIsr requests
>>>    - Config changes are propagated via zookeeper and listeners
>>>    - ACL changes depend on the implementation of the Authorizer
>>>    interface
>>>       - The default SimpleACLAuthorizer uses zookeeper and listeners
>>>
>>> Though all of these mechanisms are different, they are all commonly
>>> "eventually consistent". None of the mechanisms, as currently implemented,
>>> will block until the metadata has been propagated successfully. Changing
>>> this behavior would require a large amount of change to the
>>> KafkaController, additional inter-broker messages, and potentially a change
>>> to the Authorizer interface. These are are all changes that should not
>>> block the implementation of KIP-4.
>>>
>>> The intermediate changes in KIP-4 should allow an easy transition to
>>> "complete blocking" when the work can be done. This is supported by
>>> providing *optional* local blocking in the mean time. This local
>>> blocking only blocks until the local state on the controller is correct. We
>>> will still provide a polling mechanism for users that do not want to block
>>> at all. A polling mechanism is required in the optimal implementation too
>>> because users still need a way to check state after a timeout occurs
>>> because operations like "create topic" are not transactional. Local
>>> blocking has the added benefit of avoiding wasted poll requests to other
>>> brokers when its impossible for the request to be completed. If the
>>> controllers state is not correct, then the other brokers cant be either.
>>> Clients who don't want to validate the entire cluster state is correct can
>>> block on the controller and avoid polling all together with reasonable
>>> confidence that though they may get a retriable error on follow up
>>> requests, the requested change was successful and the cluster will be
>>> accurate eventually.
>>>
>>> Because we already add a timeout field to the requests wire protocols,
>>> changing the behavior to block until the cluster is consistent in the
>>> future would not require a protocol change. Though the version could be
>>> bumped to indicate a behavior change.
>>>
>>
>> Thanks,
>> Grant
>>
>>
>> On Mon, Mar 14, 2016 at 5:07 PM, Grant Henke <ghe...@cloudera.com> wrote:
>>
>>> I have been updating the KIP-4 wiki page based on the last KIP call and
>>> wanted to get some review and discussion around the server side
>>> implementation for admin requests. Both the "ideal" functionality and the
>>> "intermediated" functionality. The updates are still in progress, but this
>>> section is the most critical and will likely have the most discussion. This
>>> topic has had a few shifts in perspective and various discussions on
>>> synchronous vs asynchronous server support. The wiki contains my current
>>> perspective on the challenges and approach.
>>>
>>> If you have any thoughts or feedback on the "Server-side Admin Request
>>> handlers" section here
>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-2.Server-sideAdminRequesthandlers>.
>>> Lets discuss them in this thread.
>>>
>>> For reference the last KIP discussion can be viewed here:
>>> https://youtu.be/rFW0-zJqg5I?t=12m30s
>>>
>>> Thank you,
>>> Grant
>>> --
>>> Grant Henke
>>> Software Engineer | Cloudera
>>> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>>
>>
>>
>>
>> --
>> Grant Henke
>> Software Engineer | Cloudera
>> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>>
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: KIP-4 Wiki Update

Reply via email to