Hey folks, After https://gerrit.cloudera.org/#/c/4765/ finishes code review, the Kudu daemons will have support for Kerberized RPC. If we start the servers with --server_require_kerberos, then they'll require that all RPC clients authenticate themselves via Kerberos.
Of course authentication isn't that useful on its own. I think the next step along this direction is to add basic "internal service" authorization: - the master should only allow tablet server service principals to heartbeat/register/etc - the tserver should only allow master service principals to create/delete tablets, etc. (with the possible extension to also allow 'root' users to do so) I think there are a couple patterns we could follow here to enforce the authorization *Option 1) Try to enforce ACLs on a per-RPC-service level* This is more or less the approach that Hadoop takes. We can make sure that different RPC use cases are separated into different RPC "services", and apply an ACL on each service. For example, we can separate the current 'MasterService' into 'TserverToMasterService' and 'ClientToMasterService', and then register ACLs on each such service when we start the RPC server. Pros: - when adding a new method, it "automatically" gets the right ACL without having to remember to add special checks - enforces some nice separation and grouping of RPCs Cons: - doesn't handle special cases well, like 'DeleteTablet is usually called by the master, but we also want to allow admins to call the same method for use in manual recovery tools' - would require a bunch of RPC methods to be moved around, and likely break wire compatibility (I believe the RPC header currently specifies the actual service name for each method call) *Option 2) Entirely ad-hoc authorization checks* At the top of each RPC method implementation, we can manually check the authorization. Basically, imagine something like this: bool CheckTabletServerRequest(RpcContext* rpc) { // if 'rpc' is from a valid tablet server, return true // otherwise, respond to the RPC with an authz error and return false } void MasterServiceImpl::TSHeartbeat(const TSHeartbeatRequestPB* req, TSHeartbeatResponsePB* resp, rpc::RpcContext* rpc) { if (!CheckTabletServerRequest(rpc)) return; } Pros: - 100% flexible - easily allows for request-specific special cases - relies on no "magic", it's obvious when you look at the code how the authz check is applied. Cons: - easy to forget to check authorization when adding new RPCs. I recall a CVE or two in the past where we forgot to add the appropriate check in various internal Hadoop RPCs. *Option 3) Per-method authorization checks, declared in .proto file* In the .proto file describing a service, we can add a custom option like: rpc TSHeartbeat(TSHeartbeatRequestPB) returns (TSHeartbeatResponsePB) { option (kudu.authz = "tablet-servers-only") }; When we start the RPC system, we can bind these magic strings to actual method implementations, and give an error if any RPC method either mentions an invalid authz policy, or doesn't mention an authz policy at all. Pros: - we can require that all methods specify a policy (even if they specify 'none'), making it impossible to forget to authorize - still retains flexibility: we can always create an exotic policy implementation for an exotic method as necessary (or specify 'none' and do the custom check from option 2) - we can allow the same protobuf option to be specified as a service-wide 'default' for the cases where an entire service has the same authz policy Cons: - maybe a little more confusing to developers who have to go looking at the .proto file to see which authorization policy is being applied to a method? *Option X? Any other options I missed?* - I only spent 30m or so thinking about this, so maybe I missed a better option? Feedback appreciated. I'm leaning towards option 3 here, but would appreciate feedback before I go too far down that path. -Todd -- Todd Lipcon Software Engineer, Cloudera