[jira] [Updated] (HADOOP-10569) Normalize Hadoop Audit Logs
[ https://issues.apache.org/jira/browse/HADOOP-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Shukla updated HADOOP-10569: -- Description: It will be very useful to normalize the audit format across various Hadoop components. A common audit format will help both tools parse the audit record consistently across sub-projects and will be easier for humans to interpret audit details. If a new common audit format is devised it will be useful to consider the following W's of audit 1. What Action & with What Results - E.g What was done, action initiated, API invoked, Job Submitted and etc. - What were the results (success, failure etc) 2. Who - E.g User, Proxy User (If available), IP Address (if available) 3. When - Timestamp, 4. Where - What subsystem, component, node name 5. Why : Now why is difficult to answer. However with Audit event correction we can provide better context. E.g A user submitted PIG script that results in some MR jobs and HDFS read/writes can be correlated. There are perhaps 2 ways to achieve the goal of normalized audit records. 1. A common audit facility - as components can start to uptake this common audit facility, their audit records start adopting to the normalized audit record format. 2. Change each component to produce audit record in a common format. Approach 1 appears to be more doable. was: It will be very useful to normalize the audit format across various Hadoop components. A common audit format will help both tools parse the audit record consistently across sub-projects and will be easier for humans to interpret audit details. If a new common audit format is devised it will be useful to consider the following W's of audit 1. What Action & with What Results - E.g What was done, action initiated, API invoked, Job Submitted and etc. - What were the results (success, failure etc) 2. Who - E.g User, Proxy User (If available), IP Address (if available) 3. When - Timestamp, 4. Where - What subsystem, component, node name 5. Why : Now why is difficult to answer. However with Audit event correction we can provide better context. E.g A user submitted PIG script that results in some MR jobs and HDFS read/writes can be correlated. > Normalize Hadoop Audit Logs > --- > > Key: HADOOP-10569 > URL: https://issues.apache.org/jira/browse/HADOOP-10569 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Vinay Shukla > > It will be very useful to normalize the audit format across various Hadoop > components. > A common audit format will help both tools parse the audit record > consistently across sub-projects and will be easier for humans to interpret > audit details. > If a new common audit format is devised it will be useful to consider the > following W's of audit > 1. What Action & with What Results - E.g What was done, action initiated, > API invoked, Job Submitted and etc. - What were the results (success, failure > etc) > 2. Who - E.g User, Proxy User (If available), IP Address (if available) > 3. When - Timestamp, > 4. Where - What subsystem, component, node name > 5. Why : Now why is difficult to answer. However with Audit event correction > we can provide better context. E.g A user submitted PIG script that results > in some MR jobs and HDFS read/writes can be correlated. > There are perhaps 2 ways to achieve the goal of normalized audit records. > 1. A common audit facility - as components can start to uptake this common > audit facility, their audit records start adopting to the normalized audit > record format. > 2. Change each component to produce audit record in a common format. > Approach 1 appears to be more doable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API
[ https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987985#comment-13987985 ] Vinay Shukla commented on HADOOP-10433: --- Thanks for the sample Audit record. I have created a JIRA for normalizing audit records. https://issues.apache.org/jira/browse/HADOOP-10569 > Key Management Server based on KeyProvider API > -- > > Key: HADOOP-10433 > URL: https://issues.apache.org/jira/browse/HADOOP-10433 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 3.0.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HadoopKMSDocsv2.pdf, KMS-doc.pdf > > > (from HDFS-6134 proposal) > Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying > KMS. It provides an interface that works with existing Hadoop security > components (authenticatication, confidentiality). > Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 > and HADOOP-10177. > Hadoop KMS will provide an additional implementation of the Hadoop > KeyProvider class. This implementation will be a client-server implementation. > The client-server protocol will be secure: > * Kerberos HTTP SPNEGO (authentication) > * HTTPS for transport (confidentiality and integrity) > * Hadoop ACLs (authorization) > The Hadoop KMS implementation will not provide additional ACL to access > encrypted files. For sophisticated access control requirements, HDFS ACLs > (HDFS-4685) should be used. > Basic key administration will be supported by the Hadoop KMS via the, already > available, Hadoop KeyShell command line tool > There are minor changes that must be done in Hadoop KeyProvider functionality: > The KeyProvider contract, and the existing implementations, must be > thread-safe > KeyProvider API should have an API to generate the key material internally > JavaKeyStoreProvider should use, if present, a password provided via > configuration > KeyProvider Option and Metadata should include a label (for easier > cross-referencing) > To avoid overloading the underlying KeyProvider implementation, the Hadoop > KMS will cache keys using a TTL policy. > Scalability and High Availability of the Hadoop KMS can achieved by running > multiple instances behind a VIP/Load-Balancer. For High Availability, the > underlying KeyProvider implementation used by the Hadoop KMS must be High > Available. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10569) Normalize Hadoop Audit Logs
Vinay Shukla created HADOOP-10569: - Summary: Normalize Hadoop Audit Logs Key: HADOOP-10569 URL: https://issues.apache.org/jira/browse/HADOOP-10569 Project: Hadoop Common Issue Type: Improvement Components: security Reporter: Vinay Shukla It will be very useful to normalized the audit format across various Hadoop components. A common audit format will help both tools understand the audit record consistenlty across sub-projects and will be easier for humans. If a new common audit format is devised it will be useful to consider the 7 W's of audit 1. What Action & with What Results - E.g What was done, action initiated, API invoked, Job Submitted and etc. - What were the results (success, failure etc) 2. Who - E.g User, Proxy User (If available), IP Address (if available) 3. When - Timestamp, 4. Where - What subsystem, component, node name 5. Why : Now why is difficult to answer. However with Audit event correction we can provide better context. E.g A user submitted PIG script that results in some MR jobs and HDFS read/writes can be correlated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10569) Normalize Hadoop Audit Logs
[ https://issues.apache.org/jira/browse/HADOOP-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Shukla updated HADOOP-10569: -- Description: It will be very useful to normalize the audit format across various Hadoop components. A common audit format will help both tools parse the audit record consistently across sub-projects and will be easier for humans to interpret audit details. If a new common audit format is devised it will be useful to consider the following W's of audit 1. What Action & with What Results - E.g What was done, action initiated, API invoked, Job Submitted and etc. - What were the results (success, failure etc) 2. Who - E.g User, Proxy User (If available), IP Address (if available) 3. When - Timestamp, 4. Where - What subsystem, component, node name 5. Why : Now why is difficult to answer. However with Audit event correction we can provide better context. E.g A user submitted PIG script that results in some MR jobs and HDFS read/writes can be correlated. was: It will be very useful to normalized the audit format across various Hadoop components. A common audit format will help both tools understand the audit record consistenlty across sub-projects and will be easier for humans. If a new common audit format is devised it will be useful to consider the 7 W's of audit 1. What Action & with What Results - E.g What was done, action initiated, API invoked, Job Submitted and etc. - What were the results (success, failure etc) 2. Who - E.g User, Proxy User (If available), IP Address (if available) 3. When - Timestamp, 4. Where - What subsystem, component, node name 5. Why : Now why is difficult to answer. However with Audit event correction we can provide better context. E.g A user submitted PIG script that results in some MR jobs and HDFS read/writes can be correlated. > Normalize Hadoop Audit Logs > --- > > Key: HADOOP-10569 > URL: https://issues.apache.org/jira/browse/HADOOP-10569 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Vinay Shukla > > It will be very useful to normalize the audit format across various Hadoop > components. > A common audit format will help both tools parse the audit record > consistently across sub-projects and will be easier for humans to interpret > audit details. > If a new common audit format is devised it will be useful to consider the > following W's of audit > 1. What Action & with What Results - E.g What was done, action initiated, > API invoked, Job Submitted and etc. - What were the results (success, failure > etc) > 2. Who - E.g User, Proxy User (If available), IP Address (if available) > 3. When - Timestamp, > 4. Where - What subsystem, component, node name > 5. Why : Now why is difficult to answer. However with Audit event correction > we can provide better context. E.g A user submitted PIG script that results > in some MR jobs and HDFS read/writes can be correlated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API
[ https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987339#comment-13987339 ] Vinay Shukla commented on HADOOP-10433: --- Is there a sample of the audit log record? What field are audited? Would it be useful to have a common audit format across Hadoop? > Key Management Server based on KeyProvider API > -- > > Key: HADOOP-10433 > URL: https://issues.apache.org/jira/browse/HADOOP-10433 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 3.0.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HadoopKMSDocsv2.pdf, KMS-doc.pdf > > > (from HDFS-6134 proposal) > Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying > KMS. It provides an interface that works with existing Hadoop security > components (authenticatication, confidentiality). > Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 > and HADOOP-10177. > Hadoop KMS will provide an additional implementation of the Hadoop > KeyProvider class. This implementation will be a client-server implementation. > The client-server protocol will be secure: > * Kerberos HTTP SPNEGO (authentication) > * HTTPS for transport (confidentiality and integrity) > * Hadoop ACLs (authorization) > The Hadoop KMS implementation will not provide additional ACL to access > encrypted files. For sophisticated access control requirements, HDFS ACLs > (HDFS-4685) should be used. > Basic key administration will be supported by the Hadoop KMS via the, already > available, Hadoop KeyShell command line tool > There are minor changes that must be done in Hadoop KeyProvider functionality: > The KeyProvider contract, and the existing implementations, must be > thread-safe > KeyProvider API should have an API to generate the key material internally > JavaKeyStoreProvider should use, if present, a password provided via > configuration > KeyProvider Option and Metadata should include a label (for easier > cross-referencing) > To avoid overloading the underlying KeyProvider implementation, the Hadoop > KMS will cache keys using a TTL policy. > Scalability and High Availability of the Hadoop KMS can achieved by running > multiple instances behind a VIP/Load-Balancer. For High Availability, the > underlying KeyProvider implementation used by the Hadoop KMS must be High > Available. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API
[ https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987185#comment-13987185 ] Vinay Shukla commented on HADOOP-10433: --- [~tucu00] What is the audit story about the Key? Do we record, who did various key operations? I couldn't find it in the attached PDFs. > Key Management Server based on KeyProvider API > -- > > Key: HADOOP-10433 > URL: https://issues.apache.org/jira/browse/HADOOP-10433 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 3.0.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HadoopKMSDocsv2.pdf, KMS-doc.pdf > > > (from HDFS-6134 proposal) > Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying > KMS. It provides an interface that works with existing Hadoop security > components (authenticatication, confidentiality). > Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 > and HADOOP-10177. > Hadoop KMS will provide an additional implementation of the Hadoop > KeyProvider class. This implementation will be a client-server implementation. > The client-server protocol will be secure: > * Kerberos HTTP SPNEGO (authentication) > * HTTPS for transport (confidentiality and integrity) > * Hadoop ACLs (authorization) > The Hadoop KMS implementation will not provide additional ACL to access > encrypted files. For sophisticated access control requirements, HDFS ACLs > (HDFS-4685) should be used. > Basic key administration will be supported by the Hadoop KMS via the, already > available, Hadoop KeyShell command line tool > There are minor changes that must be done in Hadoop KeyProvider functionality: > The KeyProvider contract, and the existing implementations, must be > thread-safe > KeyProvider API should have an API to generate the key material internally > JavaKeyStoreProvider should use, if present, a password provided via > configuration > KeyProvider Option and Metadata should include a label (for easier > cross-referencing) > To avoid overloading the underlying KeyProvider implementation, the Hadoop > KMS will cache keys using a TTL policy. > Scalability and High Availability of the Hadoop KMS can achieved by running > multiple instances behind a VIP/Load-Balancer. For High Availability, the > underlying KeyProvider implementation used by the Hadoop KMS must be High > Available. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API
[ https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984908#comment-13984908 ] Vinay Shukla commented on HADOOP-10433: --- Does KMS support KMIP protocol ? (https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=kmip) Will KMS integrate with Hardware Security Modules (HSM) devices such as SafeNet Luna & RSA's Data Protection Manager (http://www.emc.com/security/rsa-data-protection-manager.htm)? If KMS does not speak KMIP, how can Hadoop encryption leverage enterprise grade Key management investment many enterprise level customers already have? > Key Management Server based on KeyProvider API > -- > > Key: HADOOP-10433 > URL: https://issues.apache.org/jira/browse/HADOOP-10433 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 3.0.0 >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Attachments: HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, > HADOOP-10433.patch, HADOOP-10433.patch, HadoopKMSDocsv2.pdf, KMS-doc.pdf > > > (from HDFS-6134 proposal) > Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying > KMS. It provides an interface that works with existing Hadoop security > components (authenticatication, confidentiality). > Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 > and HADOOP-10177. > Hadoop KMS will provide an additional implementation of the Hadoop > KeyProvider class. This implementation will be a client-server implementation. > The client-server protocol will be secure: > * Kerberos HTTP SPNEGO (authentication) > * HTTPS for transport (confidentiality and integrity) > * Hadoop ACLs (authorization) > The Hadoop KMS implementation will not provide additional ACL to access > encrypted files. For sophisticated access control requirements, HDFS ACLs > (HDFS-4685) should be used. > Basic key administration will be supported by the Hadoop KMS via the, already > available, Hadoop KeyShell command line tool > There are minor changes that must be done in Hadoop KeyProvider functionality: > The KeyProvider contract, and the existing implementations, must be > thread-safe > KeyProvider API should have an API to generate the key material internally > JavaKeyStoreProvider should use, if present, a password provided via > configuration > KeyProvider Option and Metadata should include a label (for easier > cross-referencing) > To avoid overloading the underlying KeyProvider implementation, the Hadoop > KMS will cache keys using a TTL policy. > Scalability and High Availability of the Hadoop KMS can achieved by running > multiple instances behind a VIP/Load-Balancer. For High Availability, the > underlying KeyProvider implementation used by the Hadoop KMS must be High > Available. -- This message was sent by Atlassian JIRA (v6.2#6252)