Hi Joel,

Thanks for the review and I plan to update the KIP today with all the updated 
info. My comments in line below.

Thanks
Parth


On 4/20/15, 10:07 AM, "Joel Koshy" 
<jjkosh...@gmail.com<mailto:jjkosh...@gmail.com>> wrote:

Hi Parth,

Nice work on this KIP.  I did another read through and had a few more
comments (with edits after I went through the thread). Many of these
comments were brought up by others as well, so it appears that the KIP
would benefit from an update at this point to incorporate comments
from the thread and last hangout.

- The operation enum is mostly self-explanatory, but it would help
  (for the sake of clarity and completeness if nothing else) to
  document exactly what each of the enums are. E.g., I think this came
  up in our hangout - SEND_CONTROL_MESSAGE is unclear and I don't
  remember what was said about it. <Edit>: After going through the
  thread it seems the conclusion was to categorize operations. E.g.,
  WRITE could apply to multiple requests. Again, this is unclear, so
  if it would be great if you could update the KIP to clarify what you
  intend.

Will add to document. SEND_CONTROL_MESSAGE Probably a very bad name but these 
are intra borker API calls like controller notifying other brokers to update 
metadata or heartbeats. Any better naming suggestions?

- When you update the KIP to categorize the requests it would also
  help to have a column for what the resource is for each.

Will add to the KIP.

- FWIW I prefer a 1-1 mapping between requests and operations. I think
  categorizing requests into these can be confusing because:
  - The resource being protected for different requests will be
    different. We are mostly thinking about topics (read/write) but
    there are requests for which topic is not the right resource.
    E.g., for topic creation, the resource as you suggested would be
    something global/common such as “cluster”. For
    OffsetCommit/FetchRequest, the resource may be the consumer group,
    or maybe a tuple of <consumer group, topic>. So this can be
    confusing - i.e., different resources and request types in the
    same category. It may be simpler and clearer to just have a 1-1
    mapping between the operation enum and requests.

I only see 2 resource categories right now cluster and topic.  I don’t really 
care one way or another so we can probably make a quick decision in tomorrow’s 
meeting to either to 1-1 mapping or have categorization?

  - Some requests that are intuitively READ have WRITE side-effects.
    E.g., (currently) TopicMetadataRequest with auto-create, although
    that will eventually go away. ConsumerMetadataRequest still
    auto-creates the offsets topic. Likewise, ADMIN-type requests may
    be interpreted as having side-effects (depending on who you ask).

Yes and what I am doing right now is checking authorization for all possible 
actions i.e. for auto-create it checks if the config has it enabled and if yes, 
check for read + create authorization. Its not very meaningful right now as 
there is no CREATE authorization but I think this is implementation detail, we 
need to ensure we call authorize with all possible operations from KafkaAPI.
- <quote>When an ACL is missing - fail open</quote>. What does missing
  mean? i.e., no explicit ACL for a principal? I'm confused by this
  especially in relation to the precedence of DENY over ALLOW. So per
  the description:
  - If no ACLs exist for topic A then ALLOW all operations on it by
    anyone.
  - If I now add an ACL for a certain principal P to ALLOW (say) WRITE
    to the topic then either:
  - This has the effect of DENYing WRITE to all other principals
  - Or, this ACL serves no purpose
  - If the effect is to DENY WRITE to all other principals, what about
    READ. Do all principals (including P) have READ permissions to
    topic A?
  - In other words, it seems for a specific ACL to be meaningful then
    fail close is necessary for an absent ACL.
  - <edit>After through the thread: it appears that the DENY override
    only applies to the given principal. i.e., in the above case it
    appears that the other principals will in fact be granted access.
    Then this makes the ACL that was added pointless right?

The rule I was going with is
- If there is no ACL I.e. This might be a topic that was created in non secure 
mode or was created before we supported ACLs. We assume you do not want 
authorization and let all requests go through.
- once you add any ACL, we assume you want authorization on the topic and all 
the general authorization rules now start to apply, I.e we fail close if we 
don’t find an ACL that allows access or if we find an ACL that denies access. 
It does not matter if you added a READACL or WRITEACL or ALLOWACL or DENY ACL. 
If you add any ACL, now every user gets checked against that and if it does not 
satisfy the ACL, request fails. I.e. If you add an ACL “Allow write to topic-1 
form user1 from all hosts” , user-1 has write access from all hosts and no 
other user has any access(except for superusers who have all the access).
- Deny ACLS are suppose to be used to restrict access authorized by some allow 
ACL, they are not suppose to be required. Implicitly anyone who does not have 
an allow acl, gets denied. The Deny ACLs are only added to give more control to 
administrators who wants more granular control with lesser config. The scenario 
described in mailing list was “Allow user X access from all hosts but 
Host1,Host2”. in absence of DENY operator you will have to exhaustively list 
all possible hosts in your ACL which is what we are trying to avoid.

- On ZK ACLs: I think ZK will be closed to everyone except Kafka
  brokers. This is a dependency on KIP-4 though. i.e., eventually all
  clients should talk to brokers only via RPC.

Yes.

- Topic owner: list vs single entry - both have issues off the bat
  (although list is more intuitive at least to me), but perhaps you
  could write up some example workflows to clarify the current
  proposal. I was thinking that anyone in the owner list should be
  considered a super-user of the topic and can grant/revoke
  permissions. They should also be allowed to add other principals as
  owners. Even with this it is unclear who should be allowed to remove
  owners.

As you pointed out in the last KIP meeting owners/creators have use out side of 
security context (plain simple auditing). I don’t think the authorizer work 
depends on this, it was my bad to even mention it in first place. I think we 
can have this discussion outside of authorizer/security context and once we 
have a way to get topic owners the default Authorizer can start using it. It 
makes sense to treat all owners as super users and I think it is safe to assume 
superusers can also modify ownership but I think this should not be treated as 
blocking work for authorization.

- What is the effect of deleting a topic - should all associated ACLs
  be deleted as well?
They should be and with acls being stored as part of TopicConfig this was taken 
care of automatically. With the new ACL management API the users will have to 
call remove ACLs explicitly to perform the cleanup. If everyone thinks this 
should be automated , with the new APIs we will need a hook(or poll) to be 
notified when a topic is deleted to perform cleanup.
- TopicConfigCache to store topic-ACLs. As mentioned above, not all
  requests will be tied to topics. We may want to have an entirely
  separate ZK directory for ACLs. We have a similar issue with quotas.
  This ties in with dynamic config management. We can certainly
  leverage the dynamic config management part of topic configs but I
  think we need to have a story for non-topic resources.

In the first proposal I was going with a topic-Acl and cluster-Acl where 
cluster-Acls were json acl local files on all brokers. With the new ACL 
management APIs we are planning to have /kafka-acl node under which all acls 
will be stored in /kakfa-acls/resource-name -> {acl json data}. Cluster acls 
will just have resource name kafka-cluster.

Thanks,

Joel

On Thu, Apr 16, 2015 at 12:15:37AM +0000, Parth Brahmbhatt wrote:
Kafka currently stores logConfig overrides specified during topic creation
in zookeeper, its just an instance of java.util.Properties converted to
json. I am proposing in addition to that we store acls and owner as well
as part of same Properties map.
There is some infrastructure around reading this config, converting it
back to Properties map and most importantly propagating any changes
efficiently which we will be able to leverage. As this infrastructure is
common to the cluster the reading (not interpreting) of config happens
outside of any authorization code.
If the TopicConfigCache just kept the json representation and left it to
authorizer to parse it, the authorizer will have to either parse the json
for each request(not acceptable) or it will have to keep one more layer of
parsed ACL instance cache. Assuming authorizer will keep an additional
caching layer we will now have to implement some way to invalidate the
cache which means the TopicConfigCache will have to be an observable which
the Authorizer observes and invalidates its cache entries when
topicConfigCache gets updated. Seemed like unnecessary complexity with not
lot to gain so I went with TopicConfigCache interpreting the json and
caching a higher level modeled object.
In summary, the interpretation is done for both optimization and
simplicity. If you think it is important to allow custom ACL format
support we can add one more pluggable config(acl.parser) and
interface(AclParser) or it could just be another method in Authorizer.
One thing to note the current ACL json is versioned so it is easy to make
changes to it however it won’t be possible to support custom ACL formats
with the current design.
Thanks
Parth
On 4/15/15, 4:29 PM, "Michael Herstine" 
<mherst...@linkedin.com.INVALID<mailto:mherst...@linkedin.com.INVALID>>
wrote:
>Hi Parth,
>
>I’m a little confused: why would Kafka need to interpret the JSON?  IIRC
>KIP-11 even says that the TopicConfigData will just store the JSON. I’m
>not really making a design recommendation here, just trying to understand
>what you’re proposing.
>
>On 4/15/15, 11:20 AM, "Parth Brahmbhatt" 
><pbrahmbh...@hortonworks.com<mailto:pbrahmbh...@hortonworks.com>>
>wrote:
>
>>Hi Michael,
>>
>>There is code in kafka codebase that reads and interprets the topic
>>config JSON which has acls, owner and logconfig properties. There are 3
>>use cases that we are supporting with current proposal:
>>
>>  *   You use out of box simpleAcl authorizer which is tied to the acl
>>stored in topic config and the format is locked down.
>>  *   You have a custom authorizer and a custom ACL store.  Ranger/Argus
>>falls under this as they have their own acl store and ui that users use
>>to configure acls on the cluster and cluster resources  like topic. It is
>>upto the custom authorizer to leverage the kafka acl configs or
>>completely ignore them as they have set a user expectation that only acls
>>configured via their ui/system will be effective.
>>  *   You have a custom authorizer but no custom Acl store. You are
>>completely tied to Acl structure that we have provided in out of box
>>implementation.
>>
>>Thanks
>>Parth
>>
>>On 4/15/15, 10:31 AM, "Michael Herstine"
>><mherst...@linkedin.com.INVALID<mailto:mherst...@linkedin.com.INVALID><mailto:mherst...@linkedin.com.INVALID>>
>>wrote:
>>
>>Hi Parth,
>>
>>One question that occurred to me at the end of today’s hangout: how tied
>>are we to a particular ACL representation under your proposal? I know
>>that
>>TopicConfigCache will just contain JSON— if a particular site decides
>>they
>>want to represent their ACLs differently, and swap out the authorizer
>>implementation, will that work?  I guess what I’m asking is whether
>>there’s any code in the Kafka codebase that will interpret that JSON, or
>>does that logic live exclusively in the authorizer?
>>
>>On 4/14/15, 10:56 PM, "Don Bosco Durai"
>><bo...@apache.org<mailto:bo...@apache.org><mailto:bo...@apache.org>> wrote:
>>
>>I also feel, having just IP would be more appropriate. Host lookup will
>>unnecessary slow things down and would be insecure as you pointed out.
>>
>>With IP, it will be also able to setup policies (in future if needed)
>>with
>>ranges or netmasks and it would be more scalable.
>>
>>Bosco
>>
>>
>>On 4/14/15, 1:40 PM, "Michael Herstine"
>><mherst...@linkedin.com.INVALID<mailto:mherst...@linkedin.com.INVALID><mailto:mherst...@linkedin.com.INVALID>>
>>wrote:
>>
>>Hi Parth,
>>
>>Sorry to chime in so late, but I’ve got a minor question on the KIP.
>>
>>Several methods take a parameter named “host” of type String. Is that
>>intended to be a hostname, or an IP address? If the former, I’m curious
>>as
>>to how that’s found (in my experience, when accepting an incoming socket
>>connection, you only know the IP address, and there isn’t a way to map
>>that to a hostname without a round trip to a DNS server, which is
>>insecure
>>anyway).
>>
>>
>>On 3/25/15, 1:07 PM, "Parth Brahmbhatt"
>><pbrahmbh...@hortonworks.com<mailto:pbrahmbh...@hortonworks.com><mailto:pbrahmbh...@hortonworks.com>>
>>wrote:
>>
>>Hi all,
>>
>>I have modified the KIP to reflect the recent change request from the
>>reviewers. I have been working on the code and I have the server side
>>code
>>for authorization ready. I am now modifying the command line utilities.
>>I
>>would really appreciate if some of the committers can spend sometime to
>>review the KIP so we can make progress on this.
>>
>>Thanks
>>Parth
>>
>>On 3/18/15, 2:20 PM, "Michael Herstine"
>><mherst...@linkedin.com.INVALID<mailto:mherst...@linkedin.com.INVALID><mailto:mherst...@linkedin.com.INVALID>>
>>wrote:
>>
>>Hi Parth,
>>
>>Thanks! A few questions:
>>
>>1. Do you want to permit rules in your ACLs that DENY access as well as
>>ALLOW? This can be handy setting up rules that have exceptions. E.g.
>>“Allow principal P to READ resource R from all hosts” with “Deny
>>principal
>>P READ access to resource R from host H1” in combination would allow P
>>to
>>READ R from all hosts *except* H1.
>>
>>2. When a topic is newly created, will there be an ACL created for it?
>>If
>>not, would that not deny subsequent access to it?
>>
>>(nit) Maybe use Principal instead of String to represent principals?
>>
>>
>>On 3/9/15, 11:48 AM, "Don Bosco Durai"
>><bo...@apache.org<mailto:bo...@apache.org><mailto:bo...@apache.org>> wrote:
>>
>>Parth
>>
>>Overall it is looking good. Couple of questionsŠ
>>
>>- Can you give an example how the policies will look like in the
>>default
>>implementation?
>>- In the operations, can we support ³CONNECT² also? This can be used
>>during Session connection
>>- Regarding access control for ³Topic Creation², since we can¹t do it
>>on
>>the server side, can we de-scope it for? And plan it as a future
>>feature
>>request?
>>
>>Thanks
>>
>>Bosco
>>
>>
>>On 3/6/15, 8:10 AM, "Harsha" 
>><ka...@harsha.io<mailto:ka...@harsha.io><mailto:ka...@harsha.io>>
>>wrote:
>>
>>Hi Parth,
>>            Thanks for putting this together. Overall it looks good
>>to
>>            me. Although AdminUtils is a concern KIP-4  can probably
>>fix
>>            that part.
>>Thanks,
>>Harsha
>>
>>On Thu, Mar 5, 2015, at 10:39 AM, Parth Brahmbhatt wrote:
>>Forgot to add links to wiki and jira.
>>Link to wiki:
>>https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authoriza
>>t
>>i
>>o
>>n
>>+
>>Interface
>>Link to Jira: https://issues.apache.org/jira/browse/KAFKA-1688
>>Thanks
>>Parth
>>From: Parth Brahmbhatt
>><pbrahmbh...@hortonworks.com<mailto:pbrahmbh...@hortonworks.com><mailto:pbrahmbh...@hortonworks.com><mailto:p
>>b
>>rahmbh...@hortonworks.com<mailto:rahmbh...@hortonworks.com>>>
>>Date: Thursday, March 5, 2015 at 10:33 AM
>>To:
>>"dev@kafka.apache.org<mailto:dev@kafka.apache.org><mailto:dev@kafka.apache.org><mailto:dev@kafka.apach
>>e
>>.org>"
>><dev@kafka.apache.org<mailto:dev@kafka.apache.org><mailto:dev@kafka.apache.org><mailto:dev@kafka.apach
>>e
>>.org>>
>>Subject: [DISCUSS] KIP-11- Authorization design for kafka security
>>Hi,
>>KIP-11 is open for discussion , I have updated the wiki with the
>>design
>>and open questions.
>>Thanks
>>Parth
>>
>>
>>
>>
>>
>>
>>
>>
>>
>


Reply via email to