Todd Lipcon has posted comments on this change. Change subject: [security] tailored TokenSigner for system catalog ......................................................................
Patch Set 10: (22 comments) Looks pretty good, mostly relatively small comments around clarity http://gerrit.cloudera.org:8080/#/c/5930/10/src/kudu/master/master.cc File src/kudu/master/master.cc: PS10, Line 116: FLAGS_token_signing_key_validity_seconds, : FLAGS_token_signing_key_rotation_seconds, : FLAGS_token_signing_key_propagation_seconds)); : nit: weird indentation PS10, Line 124: RETURN_NOT_OK(token_signer_->TryAddKey(&key)); : if (key) { : RETURN_NOT_OK(token_signer_->AddKey(std::move(key))); this looks kind of strange to me. If "TryAddKey" succeeds, then why are we calling AddKey? Maybe 'TryAddKey' should be called something else that doesn't imply that it is adding it? (haven't yet looked at the header docs for these functions, but from usage it's not intuitive) http://gerrit.cloudera.org:8080/#/c/5930/10/src/kudu/security/token_signer.cc File src/kudu/security/token_signer.cc: PS10, Line 56: DEFINE_int64(token_signing_key_rotation_seconds, 60 * 60 * 24 * 6, : "number of seconds prior of the expiration time of current " : "signing key when next key should be activated"); : T this config seems a bit "backwards" to me. I would expect if I set this to "1 day" then my key would rotate once a day. But in fact I have to set it to 6 days to get it to rotate once a day. PS10, Line 51: DEFINE_int64(token_signing_key_validity_seconds, 60 * 60 * 24 * 7, : "number of seconds that a token signing key is valid for"); : TAG_FLAG(token_signing_key_validity_seconds, advanced); : TAG_FLAG(token_signing_key_validity_seconds, experimental); : : DEFINE_int64(token_signing_key_rotation_seconds, 60 * 60 * 24 * 6, : "number of seconds prior of the expiration time of current " : "signing key when next key should be activated"); : TAG_FLAG(token_signing_key_rotation_seconds, advanced); : TAG_FLAG(token_signing_key_rotation_seconds, experimental); : : DEFINE_int64(token_signing_key_propagation_seconds, 60, : "time necessary to propagate public part of TSK " : "via tserver-to-master heartbeats with some margin"); : TAG_FLAG(token_signing_key_propagation_seconds, advanced); : TAG_FLAG(token_signing_key_propagation_seconds, experimental); I think these should be defined in master.cc where they're used - was surprised to find them here even though they aren't referenced in this .cc file. Also keep in mind that the security/ module ends up exported into Impala (because it's a dependency of KRPC) so we should try to avoid Kudu-specific terminology in flag docs. Since you're now passing them in as constructor arguments, though, it'll be fine to move these flags to master.cc and keep the docs more or less as is (Kudu-specific) PS10, Line 89: // Reload the TSKs (presumably originated from the system catalog table). : // This method can be called multiple times. this doc should probably be in the header, not on the impl. Also I think if it can be called multiple times we shouldn't call it 'Init' but rather 'LoadKeys' or something. Our convention is that Init methods must be called exactly once, and can't be concurrent with other access. PS10, Line 91: &k nit: & goes with type PS10, Line 111: if (next_key_seq_num_ <= seq_num) { : // Advance the key sequence number, if needed. : next_key_seq_num_ = seq_num + 1; : } next_key_seq_num_ = std::max(next_key_seq_num_, seq_num + 1) PS10, Line 127: tsk_deque_.clear(); : for (int i = 0; i < 2; ++i) { : auto it = tsk_by_seq.begin(); : if (it != tsk_by_seq.end()) { : tsk_deque_.emplace_back(it->second.release()); : tsk_by_seq.erase(tsk_by_seq.begin()); : } : } hrm, could this be written more simply as: tsk_deque_.clear(); for (auto& e : tsk_by_seq) { tsk_deque_.emplace_back(std::move(e.second)); } (we already know that tsk_by_seq is max 2, right? and no need to erase the entries from that map since we're about to destruct it anyway) Line 139: Status TokenSigner::GenerateAuthnToken(string username, I still think we're going to end up refactoring this back into an AuthnTokenManager that lives in master/ at some point, but we can cross that bridge when we come to it. Line 152: lock_guard<RWMutex> l(lock_); this can be a shared_lock, right? otherwise we are serializing signatures Line 164: lock_guard<RWMutex> l(lock_); can be shared Line 177: lock_guard<RWMutex> l(lock_); I think this could work with just the read lock, with the exception of the next_key_seq_num_ update. But I think it would be fine to acquire the sequence number "up front" under the write lock, or use an atomic for it, since we seem to be OK with potential sequence number gaps. (keep in mind that the RSA key generation is expensive, and that clients would be blocked if you hold the write lock here) Line 187: return Status::OK(); maybe returning Status::Aborted() or Status::AlreadyPresent() would be a clearer way to express that there's no new key necessary? http://gerrit.cloudera.org:8080/#/c/5930/10/src/kudu/security/token_signer.h File src/kudu/security/token_signer.h: PS10, Line 63: TryPushKey name doesn't match impl Line 113: // RETURN_NOT_OK(ts.AddKey(std::move(key))); One thing that I think we should document in this class is whether it's OK for there to be "holes" in the sequence numbers. For example, what if you do TryAddKey() and then you time out while adding it to the system table. Then you don't actually know whether it was added or not. When you call TryAddKey() again later, are you going to get a higher sequence number or the same one again? If the former, then you might have a gap in sequence numbers (which I think is probably fine). If the latter, we might have an issue when we then try to store it again and get a duplicate key error. I think my preference is to specify that there could always be gaps in sequence numbers (and I think the rest of the code handles it just fine) PS10, Line 122: // { : // unique_ptr<TokenSigningPrivateKey> key; : // RETURN_NOT_OK(ts.TryAddKey(&key)); : // if (key) { : // // Store the newly generated key into the system table. : // ... : // : // // Add the key into the queue of the TokenSigner. : // RETURN_NOT_OK(ts.AddKey(std::move(key))); : // } : // } : // // Check and switch to the next key, if it's time. : // RETURN_NOT_OK(ts.TryRotateKey()); this is the exact same sequence as we have above, isn't it? I think it would suffice to say that, at startup after loading existing keys, you should always immediately then check if it's also time to rotate the key (because you may have been down for a while or whatever). PS10, Line 153: eximine typo PS10, Line 158: TryAddKey per comments elsewhere, I think a better name for this might be something like GenerateNewKeyIfNeeded() or MaybeGenerateNewKey() since it isn't actually trying to _add_ the key. Line 188: std::unique_ptr<TokenVerifier> verifier_; I think the single-ownership model here is going to be a little problematic with the way that Dan's integrating TokenVerifier with the RPC layer. We should figure out the plan to resolve this. Line 197: std::deque<std::unique_ptr<TokenSigningPrivateKey>> tsk_deque_; add a doc here? which one is the current one? the head of the queue? http://gerrit.cloudera.org:8080/#/c/5930/10/src/kudu/security/token_verifier.h File src/kudu/security/token_verifier.h: PS10, Line 53: static const int64_t kPreStartSeqNum; doc PS10, Line 60: kPreStartSeqNum is the constant for this actually helping clarity or obfuscating it? -- To view, visit http://gerrit.cloudera.org:8080/5930 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie2417e2ccba6a1114db366b2f642f95362bf479c Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes