[jira] Commented: (HADOOP-4359) Support for data access authorization checking on DataNodes

Kan Zhang (JIRA) Tue, 09 Dec 2008 21:07:13 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655097#action_12655097
 ]


Kan Zhang commented on HADOOP-4359:
-----------------------------------

I plan to introduce an HDFS token, called Access Token, as a vehicle to pass 
data access authorization information from NN to DN. One can think of Access 
Tokens as capabilities; an Access Token enables its owner to access certain 
data blocks. It is issued by NN and used on DN. Access Tokens should be 
generated in such a way that their authenticity can be verified by DN.

In general, tokens can be generated in 2 ways. A) Using a public-key scheme, 
where NN chooses a pair of private/public keys and uses the private key to sign 
a token. The signature becomes an integral part of the token. DN is given NN's 
public key, which can be used to verify the signature associated with a token. 
Since only the NN knows the private key, only the NN can generate a valid 
token. B) Using a symmetric key scheme, where NN and all DNs share a secret 
key. For each token, the NN computes a keyed hash (also known as message 
authentication code or MAC) as the token authenticator. The token authenticator 
becomes an integral part of the token. When a DN receives a token, it uses its 
copy of the secret key to re-compute the token authenticator and compares it 
with the one submitted as part of the token. If they match, the token is 
verified as authentic. Since only NN and DNs know the key (DNs are trusted to 
never issue tokens; they only use the key to verify tokens they receive), no 
third party can forge tokens. Method A has the advantage that DN doesn't have 
to store any secret key and it provides stronger security in the sense that 
even if a DN is compromised, the attacker still can't forge tokens. However, 
generating and verifying public-key signatures are expensive compared to 
symmetric key operations. I plan to use method B to generate Access Tokens.

Access Tokens are ideally non-transferable, i.e., only the owner can use it. 
This means we don't have to worry if a token gets stolen, for example during 
transit. One way to make it non-transferable is to include the owner's id in 
the token and require whoever uses the token to authenticate herself as the 
owner specified in the token. I plan to simply include the owner's id in the 
token for now and DN doesn't verify it. Authentication and verification of 
owner id can be added later if needed.

Access Tokens are meant to be lightweight and short-lived. No need to renew or 
revoke an Access Token. When a cached Access Token expires, simply get a new 
one. Access Tokens should be cached only in memory and never written to disk. A 
typical use case is as follows. A HDFS client asks NN for block ids/locations 
for a file. NN verifies that the client is authorized to access the file and 
sends back block ids/locations along with an Access Token for each block. 
Whenever the HDFS client needs to access a block, it sends the block id along 
with its associated Access Token to a DN. DN verifies the Access Token before 
allowing access to the block. The HDFS client may cache Access Tokens received 
from NN in memory and only get new tokens from NN when the cached ones expire 
or accessing non-cached blocks.

An Access Token will look like the following, where access mode can be read, 
write, replicate, etc.
    TokenID = {expirationDate, ownerID, blockID, accessModes}
    TokenAuthenticator = HMAC(key, TokenID)
    Access Token = {TokenID, TokenAuthenticator}

An Access Token is valid on all DNs regardless where the data block is actually 
stored. The secret key used to compute token authenticator is randomly chosen 
by the NN and sent to DNs when they first register with the NN. There is a key 
rolling mechanism that updates this key on NN and pushes the new key to DNs at 
regular intervals.

> Support for data access authorization checking on DataNodes
> -----------------------------------------------------------
>
>                 Key: HADOOP-4359
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4359
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>             Fix For: 0.20.0
>
>
> Currently, DataNodes do not enforce any access control on accesses to its 
> data blocks. This makes it possible for an unauthorized client to read a data 
> block as long as she can supply its block ID. It's also possible for anyone 
> to write arbitrary data blocks to DataNodes. 
> When users request file accesses on the NameNode, file permission checking 
> takes place. Authorization decisions are made with regard to whether the 
> requested accesses to those files (and implicitly, to their corresponding 
> data blocks) are permitted. However, when it comes to subsequent data block 
> accesses on the DataNodes, those authorization decisions are not made 
> available to the DataNodes and consequently, such accesses are not verified. 
> Datanodes are not capable of reaching those decisions independently since 
> they don't have concepts of files, let alone file permissions.
> In order to implement data access policies consistently across HDFS services, 
> there is a need for a mechanism by which authorization decisions made on the 
> NameNode can be faithfully enforced on the DataNodes and any unauthorized 
> access is declined.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4359) Support for data access authorization checking on DataNodes

Reply via email to