[jira] [Updated] (HADOOP-10569) Normalize Hadoop Audit Logs

2014-05-02 Thread Vinay Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Shukla updated HADOOP-10569:
--

Description: 
It will be very useful to normalize the audit format across various Hadoop 
components.

A common audit format will help both tools parse the audit record consistently 
across sub-projects and will be easier for humans to interpret audit details.

If a new common audit format is devised it will be useful to consider the 
following W's of audit 

1. What Action & with What Results  - E.g What was done, action initiated, API 
invoked, Job Submitted and etc. - What were the results (success, failure etc)
2. Who - E.g User, Proxy User (If available), IP Address (if available)
3. When - Timestamp, 
4. Where - What subsystem, component, node name
5. Why : Now why is difficult to answer. However with Audit event correction we 
can provide better context. E.g A user submitted PIG script that results in 
some MR jobs and HDFS read/writes can be correlated. 

There are perhaps 2 ways to achieve the goal of normalized audit records.
1. A common audit facility - as components can start to uptake this common 
audit facility, their audit records start adopting to the normalized audit 
record format.
2. Change each component to produce audit record in a common format.

Approach 1 appears to be more doable.


  was:
It will be very useful to normalize the audit format across various Hadoop 
components.

A common audit format will help both tools parse the audit record consistently 
across sub-projects and will be easier for humans to interpret audit details.

If a new common audit format is devised it will be useful to consider the 
following W's of audit 

1. What Action & with What Results  - E.g What was done, action initiated, API 
invoked, Job Submitted and etc. - What were the results (success, failure etc)
2. Who - E.g User, Proxy User (If available), IP Address (if available)
3. When - Timestamp, 
4. Where - What subsystem, component, node name
5. Why : Now why is difficult to answer. However with Audit event correction we 
can provide better context. E.g A user submitted PIG script that results in 
some MR jobs and HDFS read/writes can be correlated. 




> Normalize Hadoop Audit Logs
> ---
>
> Key: HADOOP-10569
> URL: https://issues.apache.org/jira/browse/HADOOP-10569
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Vinay Shukla
>
> It will be very useful to normalize the audit format across various Hadoop 
> components.
> A common audit format will help both tools parse the audit record 
> consistently across sub-projects and will be easier for humans to interpret 
> audit details.
> If a new common audit format is devised it will be useful to consider the 
> following W's of audit 
> 1. What Action & with What Results  - E.g What was done, action initiated, 
> API invoked, Job Submitted and etc. - What were the results (success, failure 
> etc)
> 2. Who - E.g User, Proxy User (If available), IP Address (if available)
> 3. When - Timestamp, 
> 4. Where - What subsystem, component, node name
> 5. Why : Now why is difficult to answer. However with Audit event correction 
> we can provide better context. E.g A user submitted PIG script that results 
> in some MR jobs and HDFS read/writes can be correlated. 
> There are perhaps 2 ways to achieve the goal of normalized audit records.
> 1. A common audit facility - as components can start to uptake this common 
> audit facility, their audit records start adopting to the normalized audit 
> record format.
> 2. Change each component to produce audit record in a common format.
> Approach 1 appears to be more doable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API

2014-05-02 Thread Vinay Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987985#comment-13987985
 ] 

Vinay Shukla commented on HADOOP-10433:
---

Thanks for the sample Audit record. I have created a JIRA for normalizing audit 
records. https://issues.apache.org/jira/browse/HADOOP-10569

> Key Management Server based on KeyProvider API
> --
>
> Key: HADOOP-10433
> URL: https://issues.apache.org/jira/browse/HADOOP-10433
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HadoopKMSDocsv2.pdf, KMS-doc.pdf
>
>
> (from HDFS-6134 proposal)
> Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying 
> KMS. It provides an interface that works with existing Hadoop security 
> components (authenticatication, confidentiality).
> Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 
> and HADOOP-10177.
> Hadoop KMS will provide an additional implementation of the Hadoop 
> KeyProvider class. This implementation will be a client-server implementation.
> The client-server protocol will be secure:
> * Kerberos HTTP SPNEGO (authentication)
> * HTTPS for transport (confidentiality and integrity)
> * Hadoop ACLs (authorization)
> The Hadoop KMS implementation will not provide additional ACL to access 
> encrypted files. For sophisticated access control requirements, HDFS ACLs 
> (HDFS-4685) should be used.
> Basic key administration will be supported by the Hadoop KMS via the, already 
> available, Hadoop KeyShell command line tool
> There are minor changes that must be done in Hadoop KeyProvider functionality:
> The KeyProvider contract, and the existing implementations, must be 
> thread-safe
> KeyProvider API should have an API to generate the key material internally
> JavaKeyStoreProvider should use, if present, a password provided via 
> configuration
> KeyProvider Option and Metadata should include a label (for easier 
> cross-referencing)
> To avoid overloading the underlying KeyProvider implementation, the Hadoop 
> KMS will cache keys using a TTL policy.
> Scalability and High Availability of the Hadoop KMS can achieved by running 
> multiple instances behind a VIP/Load-Balancer. For High Availability, the 
> underlying KeyProvider implementation used by the Hadoop KMS must be High 
> Available.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10569) Normalize Hadoop Audit Logs

2014-05-02 Thread Vinay Shukla (JIRA)
Vinay Shukla created HADOOP-10569:
-

 Summary: Normalize Hadoop Audit Logs
 Key: HADOOP-10569
 URL: https://issues.apache.org/jira/browse/HADOOP-10569
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Vinay Shukla


It will be very useful to normalized the audit format across various Hadoop 
components.

A common audit format will help both tools understand the audit record 
consistenlty across sub-projects and will be easier for humans.

If a new common audit format is devised it will be useful to consider the 7 W's 
of audit 

1. What Action & with What Results  - E.g What was done, action initiated, API 
invoked, Job Submitted and etc. - What were the results (success, failure etc)
2. Who - E.g User, Proxy User (If available), IP Address (if available)
3. When - Timestamp, 
4. Where - What subsystem, component, node name
5. Why : Now why is difficult to answer. However with Audit event correction we 
can provide better context. E.g A user submitted PIG script that results in 
some MR jobs and HDFS read/writes can be correlated. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10569) Normalize Hadoop Audit Logs

2014-05-02 Thread Vinay Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Shukla updated HADOOP-10569:
--

Description: 
It will be very useful to normalize the audit format across various Hadoop 
components.

A common audit format will help both tools parse the audit record consistently 
across sub-projects and will be easier for humans to interpret audit details.

If a new common audit format is devised it will be useful to consider the 
following W's of audit 

1. What Action & with What Results  - E.g What was done, action initiated, API 
invoked, Job Submitted and etc. - What were the results (success, failure etc)
2. Who - E.g User, Proxy User (If available), IP Address (if available)
3. When - Timestamp, 
4. Where - What subsystem, component, node name
5. Why : Now why is difficult to answer. However with Audit event correction we 
can provide better context. E.g A user submitted PIG script that results in 
some MR jobs and HDFS read/writes can be correlated. 



  was:
It will be very useful to normalized the audit format across various Hadoop 
components.

A common audit format will help both tools understand the audit record 
consistenlty across sub-projects and will be easier for humans.

If a new common audit format is devised it will be useful to consider the 7 W's 
of audit 

1. What Action & with What Results  - E.g What was done, action initiated, API 
invoked, Job Submitted and etc. - What were the results (success, failure etc)
2. Who - E.g User, Proxy User (If available), IP Address (if available)
3. When - Timestamp, 
4. Where - What subsystem, component, node name
5. Why : Now why is difficult to answer. However with Audit event correction we 
can provide better context. E.g A user submitted PIG script that results in 
some MR jobs and HDFS read/writes can be correlated. 




> Normalize Hadoop Audit Logs
> ---
>
> Key: HADOOP-10569
> URL: https://issues.apache.org/jira/browse/HADOOP-10569
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Vinay Shukla
>
> It will be very useful to normalize the audit format across various Hadoop 
> components.
> A common audit format will help both tools parse the audit record 
> consistently across sub-projects and will be easier for humans to interpret 
> audit details.
> If a new common audit format is devised it will be useful to consider the 
> following W's of audit 
> 1. What Action & with What Results  - E.g What was done, action initiated, 
> API invoked, Job Submitted and etc. - What were the results (success, failure 
> etc)
> 2. Who - E.g User, Proxy User (If available), IP Address (if available)
> 3. When - Timestamp, 
> 4. Where - What subsystem, component, node name
> 5. Why : Now why is difficult to answer. However with Audit event correction 
> we can provide better context. E.g A user submitted PIG script that results 
> in some MR jobs and HDFS read/writes can be correlated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API

2014-05-01 Thread Vinay Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987339#comment-13987339
 ] 

Vinay Shukla commented on HADOOP-10433:
---

Is there a sample of the audit log record? What field are audited? Would it be 
useful to have a common audit format across Hadoop?

> Key Management Server based on KeyProvider API
> --
>
> Key: HADOOP-10433
> URL: https://issues.apache.org/jira/browse/HADOOP-10433
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HadoopKMSDocsv2.pdf, KMS-doc.pdf
>
>
> (from HDFS-6134 proposal)
> Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying 
> KMS. It provides an interface that works with existing Hadoop security 
> components (authenticatication, confidentiality).
> Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 
> and HADOOP-10177.
> Hadoop KMS will provide an additional implementation of the Hadoop 
> KeyProvider class. This implementation will be a client-server implementation.
> The client-server protocol will be secure:
> * Kerberos HTTP SPNEGO (authentication)
> * HTTPS for transport (confidentiality and integrity)
> * Hadoop ACLs (authorization)
> The Hadoop KMS implementation will not provide additional ACL to access 
> encrypted files. For sophisticated access control requirements, HDFS ACLs 
> (HDFS-4685) should be used.
> Basic key administration will be supported by the Hadoop KMS via the, already 
> available, Hadoop KeyShell command line tool
> There are minor changes that must be done in Hadoop KeyProvider functionality:
> The KeyProvider contract, and the existing implementations, must be 
> thread-safe
> KeyProvider API should have an API to generate the key material internally
> JavaKeyStoreProvider should use, if present, a password provided via 
> configuration
> KeyProvider Option and Metadata should include a label (for easier 
> cross-referencing)
> To avoid overloading the underlying KeyProvider implementation, the Hadoop 
> KMS will cache keys using a TTL policy.
> Scalability and High Availability of the Hadoop KMS can achieved by running 
> multiple instances behind a VIP/Load-Balancer. For High Availability, the 
> underlying KeyProvider implementation used by the Hadoop KMS must be High 
> Available.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API

2014-05-01 Thread Vinay Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987185#comment-13987185
 ] 

Vinay Shukla commented on HADOOP-10433:
---

[~tucu00] What is the audit story about the Key? Do we record, who did various 
key operations? I couldn't find it in the attached PDFs.

> Key Management Server based on KeyProvider API
> --
>
> Key: HADOOP-10433
> URL: https://issues.apache.org/jira/browse/HADOOP-10433
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HadoopKMSDocsv2.pdf, KMS-doc.pdf
>
>
> (from HDFS-6134 proposal)
> Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying 
> KMS. It provides an interface that works with existing Hadoop security 
> components (authenticatication, confidentiality).
> Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 
> and HADOOP-10177.
> Hadoop KMS will provide an additional implementation of the Hadoop 
> KeyProvider class. This implementation will be a client-server implementation.
> The client-server protocol will be secure:
> * Kerberos HTTP SPNEGO (authentication)
> * HTTPS for transport (confidentiality and integrity)
> * Hadoop ACLs (authorization)
> The Hadoop KMS implementation will not provide additional ACL to access 
> encrypted files. For sophisticated access control requirements, HDFS ACLs 
> (HDFS-4685) should be used.
> Basic key administration will be supported by the Hadoop KMS via the, already 
> available, Hadoop KeyShell command line tool
> There are minor changes that must be done in Hadoop KeyProvider functionality:
> The KeyProvider contract, and the existing implementations, must be 
> thread-safe
> KeyProvider API should have an API to generate the key material internally
> JavaKeyStoreProvider should use, if present, a password provided via 
> configuration
> KeyProvider Option and Metadata should include a label (for easier 
> cross-referencing)
> To avoid overloading the underlying KeyProvider implementation, the Hadoop 
> KMS will cache keys using a TTL policy.
> Scalability and High Availability of the Hadoop KMS can achieved by running 
> multiple instances behind a VIP/Load-Balancer. For High Availability, the 
> underlying KeyProvider implementation used by the Hadoop KMS must be High 
> Available.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10433) Key Management Server based on KeyProvider API

2014-04-29 Thread Vinay Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984908#comment-13984908
 ] 

Vinay Shukla commented on HADOOP-10433:
---

Does KMS support KMIP protocol ? 
(https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=kmip)

Will KMS integrate with Hardware Security Modules (HSM) devices such as SafeNet 
Luna &  RSA's Data Protection Manager 
(http://www.emc.com/security/rsa-data-protection-manager.htm)?

If KMS does not speak KMIP, how can Hadoop encryption leverage enterprise grade 
Key management investment many enterprise level customers already have?

> Key Management Server based on KeyProvider API
> --
>
> Key: HADOOP-10433
> URL: https://issues.apache.org/jira/browse/HADOOP-10433
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HADOOP-10433.patch, 
> HADOOP-10433.patch, HADOOP-10433.patch, HadoopKMSDocsv2.pdf, KMS-doc.pdf
>
>
> (from HDFS-6134 proposal)
> Hadoop KMS is the gateway, for Hadoop and Hadoop clients, to the underlying 
> KMS. It provides an interface that works with existing Hadoop security 
> components (authenticatication, confidentiality).
> Hadoop KMS will be implemented leveraging the work being done in HADOOP-10141 
> and HADOOP-10177.
> Hadoop KMS will provide an additional implementation of the Hadoop 
> KeyProvider class. This implementation will be a client-server implementation.
> The client-server protocol will be secure:
> * Kerberos HTTP SPNEGO (authentication)
> * HTTPS for transport (confidentiality and integrity)
> * Hadoop ACLs (authorization)
> The Hadoop KMS implementation will not provide additional ACL to access 
> encrypted files. For sophisticated access control requirements, HDFS ACLs 
> (HDFS-4685) should be used.
> Basic key administration will be supported by the Hadoop KMS via the, already 
> available, Hadoop KeyShell command line tool
> There are minor changes that must be done in Hadoop KeyProvider functionality:
> The KeyProvider contract, and the existing implementations, must be 
> thread-safe
> KeyProvider API should have an API to generate the key material internally
> JavaKeyStoreProvider should use, if present, a password provided via 
> configuration
> KeyProvider Option and Metadata should include a label (for easier 
> cross-referencing)
> To avoid overloading the underlying KeyProvider implementation, the Hadoop 
> KMS will cache keys using a TTL policy.
> Scalability and High Availability of the Hadoop KMS can achieved by running 
> multiple instances behind a VIP/Load-Balancer. For High Availability, the 
> underlying KeyProvider implementation used by the Hadoop KMS must be High 
> Available.



--
This message was sent by Atlassian JIRA
(v6.2#6252)