[jira] [Created] (HADOOP-9334) Update netty version

2013-02-25 Thread nkeywal (JIRA)
nkeywal created HADOOP-9334:
---

 Summary: Update netty version
 Key: HADOOP-9334
 URL: https://issues.apache.org/jira/browse/HADOOP-9334
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 3.0.0, 2.0.4-beta
Reporter: nkeywal
Priority: Minor


There are newer version available. HBase for example depends on the 3.5.9.
Latest 3.5 is 3.5.11, there is the 3.6.3 as well.

While there is no point in trying to have exactly the same version, things are 
more comfortable if the gap in version is minimal, as the dependency is client 
side as well (i.e. HBase has to choose a version anyway).

Attached a patch for the branch 2.

I haven't executed the unit tests, but HBase works ok with Hadoop on Netty 
3.5.9.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


ANNOUNCEMENT: Project Rhino: Enhanced Data Protection for the Apache Hadoop Ecosystem

2013-02-25 Thread Dey, Avik
Project Rhino

As the Apache Hadoop ecosystem extends into new markets and sees new use cases 
with security and compliance challenges, the benefits of processing sensitive 
and legally protected data with Hadoop must be coupled with protection for 
private information that limits performance impact. Project 
Rhinohttps://github.com/intel-hadoop/project-rhino/ is our open source effort 
to enhance the existing data protection capabilities of the Hadoop ecosystem to 
address these challenges, and contribute the code back to Apache.

The core of the Apache Hadoop ecosystem as it is commonly understood is:

- Core: A set of shared libraries
- HDFS: The Hadoop filesystem
- MapReduce: Parallel computation framework
- ZooKeeper: Configuration management and coordination
- HBase: Column-oriented database on HDFS
- Hive: Data warehouse on HDFS with SQL-like access
- Pig: Higher-level programming language for Hadoop computations
- Oozie: Orchestration and workflow management
- Mahout: A library of machine learning and data mining algorithms
- Flume: Collection and import of log and event data
- Sqoop: Imports data from relational databases

These components are all separate projects and therefore cross cutting concerns 
like authN, authZ, a consistent security policy framework, consistent 
authorization model and audit coverage are loosely coordinated. Some security 
features expected by our customers, such as encryption, are simply missing. Our 
aim is to take a full stack view and work with the individual projects toward 
consistent concepts and capabilities, filling gaps as we go.

Our initial goals are:

1) Framework support for encryption and key management

There is currently no framework support for encryption or key management. We 
will add this support into Hadoop Core and integrate it across the ecosystem.

2) A common authorization framework for the Hadoop ecosystem

Each component currently has its own authorization engine. We will abstract the 
common functions into a reusable authorization framework with a consistent 
interface. Where appropriate we will either modify an existing engine to work 
within this framework, or we will plug in a common default engine. Therefore we 
also must normalize how security policy is expressed and applied by each 
component. Core, HDFS, ZooKeeper, and HBase currently support simple access 
control lists (ACLs) composed of users and groups. We see this as a good 
starting point. Where necessary we will modify components so they each offer 
equivalent functionality, and build support into others.

3) Token based authentication and single sign on

Core, HDFS, ZooKeeper, and HBase currently support Kerberos authentication at 
the RPC layer, via SASL. However this does not provide valuable attributes such 
as group membership, classification level, organizational identity, or support 
for user defined attributes. Hadoop components must interrogate external 
resources for discovering these attributes and at scale this is problematic. 
There is also no consistent delegation model. HDFS has a simple delegation 
capability, and only Oozie can take limited advantage of it. We will implement 
a common token based authentication framework to decouple internal user and 
service authentication from external mechanisms used to support it (like 
Kerberos).

4) Extend HBase support for ACLs to the cell level

Currently HBase supports setting access controls at the table or column family 
level. However, many use cases would benefit from the additional capability to 
do this on a per cell basis. In fact for many users dealing with sensitive 
information the ability to do this is crucial.

5) Improve audit logging

Audit messages from various Hadoop components do not use a unified or even 
consistently formatted format. This makes analysis of logs for verifying 
compliance or taking corrective action difficult. We will build a common audit 
logging facility as part of the common authorization framework work. We will 
also build a set of common audit log processing tools for transforming them to 
different industry standard formats, for supporting compliance verification, 
and for triggering responses to policy violations.

Current JIRAs:

As part of this ongoing effort we are contributing our work to-date against the 
JIRAs listed below. As you may appreciate, the goals for Project Rhino covers a 
number of different Apache projects, the scope of work is significant and 
likely to only increase as we get additional community input. We also 
appreciate that there may be others in the Apache community that may be working 
on some of this or are interested in contributing to it. If so, we look forward 
to partnering with you in Apache to accelerate this effort so the Apache 
community can see the benefits from our collective efforts sooner. You can also 
find a more detailed version of this announcement at Project 
Rhinohttps://github.com/intel-hadoop/project-rhino/.

Please feel free to 

Re: ANNOUNCEMENT: Project Rhino: Enhanced Data Protection for the Apache Hadoop Ecosystem

2013-02-25 Thread Konstantin Boudnik
[yanking away most of the cross-posts...]

An interesting cross component project Avik. Any plans to incubate it in Apache?

Cos

On Mon, Feb 25, 2013 at 11:46PM, Dey, Avik wrote:
 Project Rhino
 
 As the Apache Hadoop ecosystem extends into new markets and sees new use
 cases with security and compliance challenges, the benefits of processing
 sensitive and legally protected data with Hadoop must be coupled with
 protection for private information that limits performance impact. Project
 Rhinohttps://github.com/intel-hadoop/project-rhino/ is our open source
 effort to enhance the existing data protection capabilities of the Hadoop
 ecosystem to address these challenges, and contribute the code back to
 Apache.
 
 The core of the Apache Hadoop ecosystem as it is commonly understood is:
 
 - Core: A set of shared libraries
 - HDFS: The Hadoop filesystem
 - MapReduce: Parallel computation framework
 - ZooKeeper: Configuration management and coordination
 - HBase: Column-oriented database on HDFS
 - Hive: Data warehouse on HDFS with SQL-like access
 - Pig: Higher-level programming language for Hadoop computations
 - Oozie: Orchestration and workflow management
 - Mahout: A library of machine learning and data mining algorithms
 - Flume: Collection and import of log and event data
 - Sqoop: Imports data from relational databases
 
 These components are all separate projects and therefore cross cutting 
 concerns like authN, authZ, a consistent security policy framework, 
 consistent authorization model and audit coverage are loosely coordinated. 
 Some security features expected by our customers, such as encryption, are 
 simply missing. Our aim is to take a full stack view and work with the 
 individual projects toward consistent concepts and capabilities, filling gaps 
 as we go.
 
 Our initial goals are:
 
 1) Framework support for encryption and key management
 
 There is currently no framework support for encryption or key management. We 
 will add this support into Hadoop Core and integrate it across the ecosystem.
 
 2) A common authorization framework for the Hadoop ecosystem
 
 Each component currently has its own authorization engine. We will abstract 
 the common functions into a reusable authorization framework with a 
 consistent interface. Where appropriate we will either modify an existing 
 engine to work within this framework, or we will plug in a common default 
 engine. Therefore we also must normalize how security policy is expressed and 
 applied by each component. Core, HDFS, ZooKeeper, and HBase currently support 
 simple access control lists (ACLs) composed of users and groups. We see this 
 as a good starting point. Where necessary we will modify components so they 
 each offer equivalent functionality, and build support into others.
 
 3) Token based authentication and single sign on
 
 Core, HDFS, ZooKeeper, and HBase currently support Kerberos authentication at 
 the RPC layer, via SASL. However this does not provide valuable attributes 
 such as group membership, classification level, organizational identity, or 
 support for user defined attributes. Hadoop components must interrogate 
 external resources for discovering these attributes and at scale this is 
 problematic. There is also no consistent delegation model. HDFS has a simple 
 delegation capability, and only Oozie can take limited advantage of it. We 
 will implement a common token based authentication framework to decouple 
 internal user and service authentication from external mechanisms used to 
 support it (like Kerberos).
 
 4) Extend HBase support for ACLs to the cell level
 
 Currently HBase supports setting access controls at the table or column 
 family level. However, many use cases would benefit from the additional 
 capability to do this on a per cell basis. In fact for many users dealing 
 with sensitive information the ability to do this is crucial.
 
 5) Improve audit logging
 
 Audit messages from various Hadoop components do not use a unified or even 
 consistently formatted format. This makes analysis of logs for verifying 
 compliance or taking corrective action difficult. We will build a common 
 audit logging facility as part of the common authorization framework work. We 
 will also build a set of common audit log processing tools for transforming 
 them to different industry standard formats, for supporting compliance 
 verification, and for triggering responses to policy violations.
 
 Current JIRAs:
 
 As part of this ongoing effort we are contributing our work to-date against 
 the JIRAs listed below. As you may appreciate, the goals for Project Rhino 
 covers a number of different Apache projects, the scope of work is 
 significant and likely to only increase as we get additional community input. 
 We also appreciate that there may be others in the Apache community that may 
 be working on some of this or are interested in contributing to it. If so, we 
 look forward to partnering with you in 

Re: ANNOUNCEMENT: Project Rhino: Enhanced Data Protection for the Apache Hadoop Ecosystem

2013-02-25 Thread Avik Dey
[thanks appreciate your doing that, the announcement itself was
cross-posted as outreach]

Thanks Cos.

As I see the work currently, I believe most, if not all of these, will be
work against JIRAs in individual projects similar to the JIRAs posted here
https://github.com/intel-hadoop/project-rhino. If we get to a point where
some of the future work needs a home outside of the individual projects,
happy to incubate that work in Apache.

~avik



On Mon, Feb 25, 2013 at 4:18 PM, Konstantin Boudnik c...@apache.org wrote:

 [yanking away most of the cross-posts...]

 An interesting cross component project Avik. Any plans to incubate it in
 Apache?

 Cos

 On Mon, Feb 25, 2013 at 11:46PM, Dey, Avik wrote:
  Project Rhino
 
  As the Apache Hadoop ecosystem extends into new markets and sees new use
  cases with security and compliance challenges, the benefits of processing
  sensitive and legally protected data with Hadoop must be coupled with
  protection for private information that limits performance impact.
 Project
  Rhinohttps://github.com/intel-hadoop/project-rhino/ is our open source
  effort to enhance the existing data protection capabilities of the Hadoop
  ecosystem to address these challenges, and contribute the code back to
  Apache.
 
  The core of the Apache Hadoop ecosystem as it is commonly understood is:
 
  - Core: A set of shared libraries
  - HDFS: The Hadoop filesystem
  - MapReduce: Parallel computation framework
  - ZooKeeper: Configuration management and coordination
  - HBase: Column-oriented database on HDFS
  - Hive: Data warehouse on HDFS with SQL-like access
  - Pig: Higher-level programming language for Hadoop computations
  - Oozie: Orchestration and workflow management
  - Mahout: A library of machine learning and data mining algorithms
  - Flume: Collection and import of log and event data
  - Sqoop: Imports data from relational databases
 
  These components are all separate projects and therefore cross cutting
 concerns like authN, authZ, a consistent security policy framework,
 consistent authorization model and audit coverage are loosely coordinated.
 Some security features expected by our customers, such as encryption, are
 simply missing. Our aim is to take a full stack view and work with the
 individual projects toward consistent concepts and capabilities, filling
 gaps as we go.
 
  Our initial goals are:
 
  1) Framework support for encryption and key management
 
  There is currently no framework support for encryption or key
 management. We will add this support into Hadoop Core and integrate it
 across the ecosystem.
 
  2) A common authorization framework for the Hadoop ecosystem
 
  Each component currently has its own authorization engine. We will
 abstract the common functions into a reusable authorization framework with
 a consistent interface. Where appropriate we will either modify an existing
 engine to work within this framework, or we will plug in a common default
 engine. Therefore we also must normalize how security policy is expressed
 and applied by each component. Core, HDFS, ZooKeeper, and HBase currently
 support simple access control lists (ACLs) composed of users and groups. We
 see this as a good starting point. Where necessary we will modify
 components so they each offer equivalent functionality, and build support
 into others.
 
  3) Token based authentication and single sign on
 
  Core, HDFS, ZooKeeper, and HBase currently support Kerberos
 authentication at the RPC layer, via SASL. However this does not provide
 valuable attributes such as group membership, classification level,
 organizational identity, or support for user defined attributes. Hadoop
 components must interrogate external resources for discovering these
 attributes and at scale this is problematic. There is also no consistent
 delegation model. HDFS has a simple delegation capability, and only Oozie
 can take limited advantage of it. We will implement a common token based
 authentication framework to decouple internal user and service
 authentication from external mechanisms used to support it (like Kerberos).
 
  4) Extend HBase support for ACLs to the cell level
 
  Currently HBase supports setting access controls at the table or column
 family level. However, many use cases would benefit from the additional
 capability to do this on a per cell basis. In fact for many users dealing
 with sensitive information the ability to do this is crucial.
 
  5) Improve audit logging
 
  Audit messages from various Hadoop components do not use a unified or
 even consistently formatted format. This makes analysis of logs for
 verifying compliance or taking corrective action difficult. We will build a
 common audit logging facility as part of the common authorization framework
 work. We will also build a set of common audit log processing tools for
 transforming them to different industry standard formats, for supporting
 compliance verification, and for triggering responses to policy 

[jira] [Created] (HADOOP-9335) Including UNIX like sort options for ls shell command

2013-02-25 Thread Arjun K R (JIRA)
Arjun K R created HADOOP-9335:
-

 Summary: Including UNIX like sort options for ls shell command
 Key: HADOOP-9335
 URL: https://issues.apache.org/jira/browse/HADOOP-9335
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.20.2
Reporter: Arjun K R
Priority: Minor


Currently ls shell command does not support sort optiions.The ls shell command 
should include following unix like sort options :
-t : sort by modification time
-S : sort by file size
-r : reverse the sort order
-u : sort by acess time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira