[jira] [Commented] (HDFS-10419) Building HDFS on top of new storage layer (HDSL)

2018-04-03 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424559#comment-16424559
 ] 

Sanjay Radia commented on HDFS-10419:
-

While I prefer HDSS, I would gladly let Anu who has done the bulk of heavy 
lifting in this project to have the final say on the name (unless his name 
choice was very horrible which isn't). The most passionate debates in any 
projects are always the name :)

+1  HDDS -  Hadoop Distributed Data Store.

> Building HDFS on top of new storage layer (HDSL)
> 
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10419) Building HDFS on top of new storage layer (HDSL)

2018-03-30 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420318#comment-16420318
 ] 

Sanjay Radia commented on HDFS-10419:
-

I prefer HDSS :  Hadoop Distributed Storage System

 

sanjay

> Building HDFS on top of new storage layer (HDSL)
> 
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10419) Building HDFS on top of new storage layer (HDSL)

2018-03-16 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403116#comment-16403116
 ] 

Sanjay Radia commented on HDFS-10419:
-

In the " [VOTE] Merging branch HDFS-7240 to trunk" thread [~andrew.wang] asked:
{quote}*Sanjay says*:
 >- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen 
 >acknowledge the >benefit of the >>new block layer).  We have two choices here

>** a) Evolve NN so that it can interact with both old and new block layer,

 >**  b) Fork and create new NN that works only with new block layer, the old 
NN will continue to work with old >>block layer.

>There are trade-offs but clearly the 2nd option has least impact on the old 
>HDFS code.

*Andrew asks*: Are you proposing that we pursue the 2nd option to integrate 
HDSL with HDFS?
{quote}
Originally I would have preferred (a), but Owen made a strong case for (b) in 
my discussions with his last week. I believe approach (a) or (b) will depend 
strongly on what we want to do. For example if we do milestone-1 and get the 2x 
scalability and decide to stop there then clearly go with option (a) - it will 
require little refactoring and one can run old and new HDFS side-by-side. If 
you are planning to follow up milestone-1 with say the caching the working set 
of the namespace, then forking the NN code (ie option b) might be better, and 
the new NN will have to keep pulling over features and bug fixes from the old 
NN.. Konstantine has proposed  other alternatives and we would  evaluate (a) or 
(b) for his alternative.  I am not locked into any particular path or how we 
would do it.

 

> Building HDFS on top of new storage layer (HDSL)
> 
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7240) Scaling HDFS

2018-01-30 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Summary: Scaling HDFS  (was: Object store in HDFS)

> Scaling HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, 
> HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, 
> HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, 
> HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its 
> scaling challenges and how to address those challenges. Scaling HDFS requires 
> scaling the namespace layer and also the block layer. _This jira provides a 
> new block layer,  Hadoop Distributed Storage Layer (HDSL), that scales the 
> block layer by grouping blocks into containers thereby reducing the 
> block-to-location map and also reducing the number of block reports and their 
> processing_
> _A scalable namespace can be put on top this scalable block layer:_
>  * _HDFS-10419 describes how the existing NN can be modified to use the new 
> block layer._
>  * _HDFS-13074  also provides, as an_ *_interim_* _step; a scalable flat 
> Key-Value namespace  on top of the new block layer; while it does not provide 
> the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, 
> FileContext)._
>  
> Old Description
> This jira proposes to add object store capabilities into HDFS. 
>  As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
>  In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7240) Object store in HDFS

2018-01-30 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Description: 
[^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its scaling 
challenges and how to address those challenges. Scaling HDFS requires scaling 
the namespace layer and also the block layer. _This jira provides a new block 
layer,  Hadoop Distributed Storage Layer (HDSL), that scales the block layer by 
grouping blocks into containers thereby reducing the block-to-location map and 
also reducing the number of block reports and their processing_

_A scalable namespace can be put on top this scalable block layer:_
 * _HDFS-10419 describes how the existing NN can be modified to use the new 
block layer._
 * _HDFS-13074  also provides, as an_ *_interim_* _step; a scalable flat 
Key-Value namespace  on top of the new block layer; while it does not provide 
the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, 
FileContext)._

 

Old Description

This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
generic storage layer. Using the Block Pool abstraction, new kinds of 
namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
storage, but independent of namespace metadata.

I will soon update with a detailed design document.

  was:
[Scaling HDFS| x] describes areas where HDFS does well and its scaling 
challenges and how to address those challenges. Scaling HDFS requires scaling 
the namespace layer and also the block layer. _This jira provides a new block 
layer,  Hadoop Distributed Storage Layer (HDSL), that scales the block layer by 
grouping blocks into containers thereby reducing the block-to-location map and 
also reducing the number of block reports and their processing_

_A scalable namespace can be put on top this scalable block layer:_
 * _HDFS-10419 describes how the existing NN can be modified to use the new 
block layer._
 * _HDFS-13074  also provides,  as an_ *_interim_* _step; a scalable  flat KV 
namespace  on top of the new block layer; while it does not provide the HDFS 
API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ 

 

 

Old Description

This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
generic storage layer. Using the Block Pool abstraction, new kinds of 
namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
storage, but independent of namespace metadata.

I will soon update with a detailed design document.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, 
> HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, 
> HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, 
> HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its 
> scaling challenges and how to address those challenges. Scaling HDFS requires 
> scaling the namespace layer and also the block layer. _This jira provides a 
> new block layer,  Hadoop Distributed Storage Layer (HDSL), that scales the 
> block layer by grouping blocks into containers thereby reducing the 
> block-to-location map and also reducing the number of block reports and their 
> processing_
> _A scalable namespace can be put on top this scalable block layer:_
>  * _HDFS-10419 describes how the existing NN can be modified to use the new 
> block layer._
>  * _HDFS-13074  also provides, as an_ *_interim_* _step; a scalable flat 
> Key-Value namespace  on top of the new block layer; while it does not provide 
> the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, 
> FileContext)._
>  
> Old Description
> This jira proposes to add object store capabilities into HDFS. 
>  As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
>  In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by 

[jira] [Updated] (HDFS-7240) Object store in HDFS

2018-01-30 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Description: 
[Scaling HDFS| x] describes areas where HDFS does well and its scaling 
challenges and how to address those challenges. Scaling HDFS requires scaling 
the namespace layer and also the block layer. _This jira provides a new block 
layer,  Hadoop Distributed Storage Layer (HDSL), that scales the block layer by 
grouping blocks into containers thereby reducing the block-to-location map and 
also reducing the number of block reports and their processing_

_A scalable namespace can be put on top this scalable block layer:_
 * _HDFS-10419 describes how the existing NN can be modified to use the new 
block layer._
 * _HDFS-13074  also provides,  as an_ *_interim_* _step; a scalable  flat KV 
namespace  on top of the new block layer; while it does not provide the HDFS 
API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ 

 

 

Old Description

This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
generic storage layer. Using the Block Pool abstraction, new kinds of 
namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
storage, but independent of namespace metadata.

I will soon update with a detailed design document.

  was:
[^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its scaling 
challenges and how to address those challenges. Scaling HDFS requires scaling 
the namespace layer and also the block layer. _This jira provides a new block 
layer,  Hadoop Distributed Storage Layer (HDSL), that scales the block layer by 
grouping blocks into containers thereby reducing the block-to-location map and 
also reducing the number of block reports and their processing_

_A scalable namespace can be put on top this scalable block layer:_
 * _HDFS-10419 describes how the existing NN can be modified to use the new 
block layer._
 * _HDFS-13074  also provides,  as an_ *_interim_* _step; a scalable  flat KV 
namespace  on top of the new block layer; while it does not provide the HDFS 
API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ 

 

 

Old Description

This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
generic storage layer. Using the Block Pool abstraction, new kinds of 
namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
storage, but independent of namespace metadata.

I will soon update with a detailed design document.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, 
> HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, 
> HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, 
> HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> [Scaling HDFS| x] describes areas where HDFS does well and its scaling 
> challenges and how to address those challenges. Scaling HDFS requires scaling 
> the namespace layer and also the block layer. _This jira provides a new block 
> layer,  Hadoop Distributed Storage Layer (HDSL), that scales the block layer 
> by grouping blocks into containers thereby reducing the block-to-location map 
> and also reducing the number of block reports and their processing_
> _A scalable namespace can be put on top this scalable block layer:_
>  * _HDFS-10419 describes how the existing NN can be modified to use the new 
> block layer._
>  * _HDFS-13074  also provides,  as an_ *_interim_* _step; a scalable  flat KV 
> namespace  on top of the new block layer; while it does not provide the HDFS 
> API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ 
>  
>  
> Old Description
> This jira proposes to add object store capabilities into HDFS. 
>  As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
>  In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HDFS-7240) Object store in HDFS

2018-01-30 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Description: 
[^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its scaling 
challenges and how to address those challenges. Scaling HDFS requires scaling 
the namespace layer and also the block layer. _This jira provides a new block 
layer,  Hadoop Distributed Storage Layer (HDSL), that scales the block layer by 
grouping blocks into containers thereby reducing the block-to-location map and 
also reducing the number of block reports and their processing_

_A scalable namespace can be put on top this scalable block layer:_
 * _HDFS-10419 describes how the existing NN can be modified to use the new 
block layer._
 * _HDFS-13074  also provides,  as an_ *_interim_* _step; a scalable  flat KV 
namespace  on top of the new block layer; while it does not provide the HDFS 
API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ 

 

 

Old Description

This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
generic storage layer. Using the Block Pool abstraction, new kinds of 
namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
storage, but independent of namespace metadata.

I will soon update with a detailed design document.

  was:
This jira proposes to add object store capabilities into HDFS. 
As part of the federation work (HDFS-1052) we separated block storage as a 
generic storage layer. Using the Block Pool abstraction, new kinds of 
namespaces can be built on top of the storage layer i.e. datanodes.
In this jira I will explore building an object store using the datanode 
storage, but independent of namespace metadata.

I will soon update with a detailed design document.






> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, 
> HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, 
> HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, 
> HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its 
> scaling challenges and how to address those challenges. Scaling HDFS requires 
> scaling the namespace layer and also the block layer. _This jira provides a 
> new block layer,  Hadoop Distributed Storage Layer (HDSL), that scales the 
> block layer by grouping blocks into containers thereby reducing the 
> block-to-location map and also reducing the number of block reports and their 
> processing_
> _A scalable namespace can be put on top this scalable block layer:_
>  * _HDFS-10419 describes how the existing NN can be modified to use the new 
> block layer._
>  * _HDFS-13074  also provides,  as an_ *_interim_* _step; a scalable  flat KV 
> namespace  on top of the new block layer; while it does not provide the HDFS 
> API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ 
>  
>  
> Old Description
> This jira proposes to add object store capabilities into HDFS. 
>  As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
>  In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7240) Object store in HDFS

2018-01-30 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Attachment: HDFS Scalability-v2.pdf

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, 
> HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, 
> HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, 
> HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers

2018-01-23 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336903#comment-16336903
 ] 

Sanjay Radia commented on HDFS-10419:
-

{quote}As a side note, I didn't understand why you used 50MB blocks in your 
math. The default is 128MB and many people run HDFS with 512MB blocks.
{quote}
While default block size is 128MB in many clusters including the ones at Yahoo 
(at least in 2011 when I left) the actual average block size was 50MB because 
most files had one block and even the first block was not full.

> Building HDFS on top of Ozone's storage containers
> --
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-01-22 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334743#comment-16334743
 ] 

Sanjay Radia commented on HDFS-12990:
-

For this Jira, I can go either way.
 In this specific case there is a very simple work around for the 2.x customer 
if he reads the release notes and adds a config to use his old port number. 
Anyone going from 2.x to 3.0 should read the release. However I can see that  
other harder cases may come up over the next couple of months where there isn’t 
such a convenient workaround. For example we are likely to come up with new 
issues when we do upgrade testing and there will be similar debates including 
arguments on the importance of supporting rolling upgrade from 2x to 3.0. 
  
 To avoid such debates I suggest we amend our guideline to something of the 
following lines:
  
 On each new major release,  till a tag such as  “stable” is assigned by the 
PMC,  we allow  changes that make the software *compatible with the previous 
major release, even if that change breaks compatibility within the major 
release.* These changes may be for where we accidentally break compatibility or 
even when we concisely made a change without fully understanding or 
appreciating the impact of the incompatible change.  The PMC will determine the 
testing criteria for assigning the stable tag.
  
  
 The above guideline  will reduce such debates during a GA of a Major release.  
Further, users will realize that they need to wait for the stable tag before 
going production.   One might argue that a  GA release without a “stable” tag 
is merely a glorified beta or a GA-candidate. Yes i agree. Perhaps in this case 
we may have rushed from beta to GA a little too early. I think the above forces 
the PMC to seriously determine GA testing criteria and  validate whether or 
they have been met. 

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers

2017-12-23 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302631#comment-16302631
 ] 

Sanjay Radia commented on HDFS-10419:
-

Correct, NN is not part of the consensus with the new block-container layer -- 
that is actually goodness - the two layers are decoupled and help 
reduce/eliminate global  locks  in state management in NN . In current HDFS the 
NN drives the consensus on failures. Also upon client death the NN closes an 
open-file  after establishing length; in the new world it will also have to 
close an open-file but,  as you state, by an explicit request asking the 
container replicas for the block-length. 

> Building HDFS on top of Ozone's storage containers
> --
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12952) Change OzoneFS's semtics to allow readers to see file content while being written

2017-12-20 Thread Sanjay Radia (JIRA)
Sanjay Radia created HDFS-12952:
---

 Summary: Change OzoneFS's semtics to allow readers to see file 
content while being written
 Key: HDFS-12952
 URL: https://issues.apache.org/jira/browse/HDFS-12952
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sanjay Radia
Assignee: Anu Engineer


Currently Ozone KSM give visibility to a file only when the file is closed, 
which is similar to S3 FS. OzoneFs should allow partial file visibility as a 
file is being written to match HDFS.

Note this should be fairly straightforward because the block-container layer 
maintains block length consistency as a block is being written even under 
failures due to its use of Raft.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers

2017-12-20 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298728#comment-16298728
 ] 

Sanjay Radia commented on HDFS-10419:
-

The new block-container layer allows partial visibility of a block even before 
the block has finalized an closed. It maintains block length consistency 
without the help of the namespace layer (KSM or NN) as a block is being written 
even under failures due to its use of Raft. Hence the NN, when plugged into the 
new block-container layer, will be fine - we can get the HDFS semantics.

You are right that the Ozone KSM give visibility to a file only when the file 
is closed, which is similar to S3 FS.  I will create  a jira to fix that so 
that the OzoneFs matches  HDFS for partail file visibility as a file is being 
written.




> Building HDFS on top of Ozone's storage containers
> --
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-12-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297454#comment-16297454
 ] 

Sanjay Radia commented on HDFS-7240:


One of issues raised is that connecting the NN to the new block-container layer 
will be very difficult because removing the FSN/BM lock is challenging.
I have attached a doc [Evolving NN using new block container 
layer|https://issues.apache.org/jira/secure/attachment/12902931/Evolving%20NN%20using%20new%20block-container%20layer.pdf]
 to  HDFS-10419 that describes 2 milestones for connecting the NN to the new 
block-container layer. The first one does *not* require removing the FSN/BM 
lock and still gives close to 2x scalability because the block map (which 
becomes the container map) is reduced significantly.

I would still like to also point out (as stated above) and in the doc that the 
new block-container layer keeps a consistent state using Raft and hence 
eliminates the coupling between the namespace layer and block layer and that 
the 2nd milestone of removing the FSN/BM lock is much easier with the new block 
layer. If you disagree with my lock argument, then the first milestone get good 
scalability without removing the lock.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10419) Building HDFS on top of Ozone's storage containers

2017-12-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-10419:

Attachment: Evolving NN using new block-container layer.pdf

I have attached a doc that describes how the existing NN can be modified to 
plug in the new block-container layer provided by HDFS-7240. Two key milestone 
are describe: First milestone is where the Container Map is kept in NN (gets us 
to almost 2x scalability since container map is 1/40th of original block map 
assuming an average actual block size of 50MB); this milestone does NOT require 
removing the FSN/BM lock. The 2nd milestone is where the container map and 
block management is completely removed which gets us to 2x scalability. After 
the 2nd milestone, the NN can be evolved in several directions for further 
scalability.


> Building HDFS on top of Ozone's storage containers
> --
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: Evolving NN using new block-container layer.pdf
>
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7240) Object store in HDFS

2017-12-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Attachment: (was: Evolving NN using new block-container layer.pdf)

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-7240) Object store in HDFS

2017-12-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Comment: was deleted

(was: I have attached a doc that   describes   how   the   existing   NN   can  
 be   modified   to   plug   in   the   new block-container   layer provided by 
HDFS-7240. Two   key milestone   are   describe:   First   milestone   is   
where   the   Container Map   is   kept   in   NN   (gets   us   to   almost   
2x   scalability   since   container   map   is   1/40th   of original   block  
 map assuming an  *average actual* block size of 50MB); this milestone does NOT 
require removing the FSN/BM lock.  The   2nd   milestone   is   where   the   
container   map   and   block management   is   completely   removed   which   
gets   us   to   2x   scalability.   After   the   2nd milestone,   the   NN   
can   be   evolved   in   several   directions   for   further   scalability.)

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, 
> MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7240) Object store in HDFS

2017-12-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Attachment: Evolving NN using new block-container layer.pdf

I have attached a doc that   describes   how   the   existing   NN   can   be   
modified   to   plug   in   the   new block-container   layer provided by 
HDFS-7240. Two   key milestone   are   describe:   First   milestone   is   
where   the   Container Map   is   kept   in   NN   (gets   us   to   almost   
2x   scalability   since   container   map   is   1/40th   of original   block  
 map assuming an  *average actual* block size of 50MB); this milestone does NOT 
require removing the FSN/BM lock.  The   2nd   milestone   is   where   the   
container   map   and   block management   is   completely   removed   which   
gets   us   to   2x   scalability.   After   the   2nd milestone,   the   NN   
can   be   evolved   in   several   directions   for   further   scalability.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Evolving NN using new block-container layer.pdf, HDFS 
> Scalability and Ozone.pdf, HDFS-7240.001.patch, HDFS-7240.002.patch, 
> HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, 
> HDFS-7240.005.patch, HDFS-7240.006.patch, MeetingMinutes.pdf, 
> Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-21 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261791#comment-16261791
 ] 

Sanjay Radia commented on HDFS-7240:


Ozone Cloudera Meeting  Date: Thursday, November 16th 2017
Location: online conferencing

Attendees: ATM, Andrew, Anu, Aaron Fabbri, Jitendra, Sanjay, Sean Mackrory, 
other listeners on the phone

Main discussion centered around:
* Wouldn't Ozone be better off as a separate project?
* Why should it be merged now?

Discussion: (This incorporate Andrew’s minutes and adds to it.)

* Anu: Don't want to have this separate since it confuses people about the 
long-term vision of Ozone. It's intended as block management for HDFS.
   * Andrew: In its current state, Ozone cannot be plugged into the NN as the 
BM layer, so it seems premature to merge. Can't benefit existing users, and 
they can't test it.
   * Response: The Ozone block layer is at a good integration point, and we 
want to move on with  the NameNode integration as new block layer. Benefits via 
KV namespace/FileSystemAPI is there and completely usable for Hive and Spark 
apps.
   * Andrew: We can do the FSN/BM lock split without merging Ozone. Separate 
efforts. This lock split is also a major effort by itself, and is a dangerous 
change. It's something that should be baked in production.
  * Sanjay: Agree that the lock split should be done in branch. But disagree on 
how hard it will be. The split was hard in past but will be easier with new 
block layer: one of the key reasons for the coupling of Block-layer to 
Namespace layer is that the block length of the each replica at block close 
time, esp under failures, has to be consistent. This is done in the central NN 
today (due to lack of raft/paxos like protocol in the original block layer). 
The block-container layer uses raft for consistency and no longer needs a 
central agent like the NN. Then new block-layers built-in consistent state 
management simplifies the  separation.
* Sanjay: Ozone developers "willing to take the hit" of the slow Hadoop release 
cadence. Want to make this part of HDFS since it's easier for users to test and 
consume without installing a new cluster.
   * ATM: Can still share the same hardware, and run the Ozone daemons 
alongside.
   * Sanjay countered this 
* Sanjay: Want to keep Ozone block management inside the Datanode process to 
enable various synergies such as sharing the new netty based protocol engy or 
fast-copy between HDFS and Ozone. Not all data needs all the HDFS features like 
encryption, erasure coding, etc, and this data could be stored in Ozone.
   * Andrew: This fast-copy hasn't been implemented or discussed yet. Unclear 
if it'll work at all with existing HDFS block management. Won't work with 
encryption or erasure coding. Not clear whether it requires being in the same 
DN process even.
   * It does have to work with encryption and EC to give value. It can work 
with non-encrypted and non EC which are majority of blocks in most Hadoop 
clusters. We will provide a design of the shallow-copy. 
Sanjay/Anu: Ozone is also useful to test with just the key-value interface. 
It's a Hadoop-compatible FileSystem, so apps many apps such as Hive and Spark 
can work also on Ozone since they have or ensured that they work well on KV 
flat namespace.
   * Andrew: If it provides a new API and doesn't support the HDFS feature-set, 
doesn't this support it being its own project?
  * Sanjay - It provides the EXISTING Hadoop FileSystem interface now. Note 
customers are used to have different parts of the namespace(s) having different 
features: Customers have asked for Zones with different features enabled [ see 
summary -  to avoid duplication].
 * AaronF: Ozone is a lot of new code and Hadoop already has so much code.. It 
is better to have separate projects and not add to Hadoop/HDFS.
Sanjay: Agree it is lot of code. Sometimes, we often have to add siginficant 
new code for a project move forward. We have tried to incrementally work around 
HDFS Scaling, the NN’s manageability and slow startup issues. This new code 
base fundamentally moves us forward in addressing the long standing issues. 
Besides this “lots of new code” argument can be used later to prevent the merge 
of the projects.


Summary: 
There is agreement that the new block-container layer is a good way to solve 
the block scaling issue of HDFS.  There is no consensus on merging the branch 
in vs fork Ozone into a new project. The main objection to merging into HDFS is 
that integrating the new block-container layer with the exiting NN will be a 
very hard project since the lock split in the NN is very challenging.

Cloudera’s team perspective: (taken from Anderew’s minutes)
* Ozone could be its own project and integrated later, or remain on an HDFS 
branch. There are benefits to Ozone being a separate project. Can release 
faster, iterate more quickly on feedback, and 

[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers

2017-11-03 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238379#comment-16238379
 ] 

Sanjay Radia commented on HDFS-10419:
-

HDFS-5389 describes one approach of building a NN that scales its namespace 
better than the current NN. 
It proposes caching only working set namespace in memory; also see [HUG - 
Removing Namenode's 
Limitation|https://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation].
 Independent studies have also analysed  LRU caching of HDFS Metadata  
[Metadata Traces and Workload Models for Evaluating Big Storage 
Systems|https://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation]
 This approach works because in spite of having large amounts of data (say data 
for the last five years) most of the data that is accessed is recent (say last 
3-9 months); hence the working set can fit in memory.

> Building HDFS on top of Ozone's storage containers
> --
>
> Key: HDFS-10419
> URL: https://issues.apache.org/jira/browse/HDFS-10419
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
>
> In HDFS-7240, Ozone defines storage containers to store both the data and the 
> metadata. The storage container layer provides an object storage interface 
> and aims to manage data/metadata in a distributed manner. More details about 
> storage containers can be found in the design doc in HDFS-7240.
> HDFS can adopt the storage containers to store and manage blocks. The general 
> idea is:
> # Each block can be treated as an object and the block ID is the object's key.
> # Blocks will still be stored in DataNodes but as objects in storage 
> containers.
> # The block management work can be separated out of the NameNode and will be 
> handled by the storage container layer in a more distributed way. The 
> NameNode will only manage the namespace (i.e., files and directories).
> # For each file, the NameNode only needs to record a list of block IDs which 
> are used as keys to obtain real data from storage containers.
> # A new DFSClient implementation talks to both NameNode and the storage 
> container layer to read/write.
> HDFS, especially the NameNode, can get much better scalability from this 
> design. Currently the NameNode's heaviest workload comes from the block 
> management, which includes maintaining the block-DataNode mapping, receiving 
> full/incremental block reports, tracking block states (under/over/miss 
> replicated), and joining every writing pipeline protocol to guarantee the 
> data consistency. These work bring high memory footprint and make NameNode 
> suffer from GC. HDFS-5477 already proposes to convert BlockManager as a 
> service. If we can build HDFS on top of the storage container layer, we not 
> only separate out the BlockManager from the NameNode, but also replace it 
> with a new distributed management scheme.
> The storage container work is currently in progress in HDFS-7240, and the 
> work proposed here is still in an experimental/exploring stage. We can do 
> this experiment in a feature branch so that people with interests can be 
> involved.
> A design doc will be uploaded later explaining more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7240) Object store in HDFS

2017-11-03 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-7240:
---
Attachment: HDFS Scalability and Ozone.pdf

I have added a document that explains a design for scaling HDFS and how Ozone 
paves the way towards the full solution.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>Priority: Major
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9244) Support nested encryption zones

2016-01-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107238#comment-15107238
 ] 

Sanjay Radia commented on HDFS-9244:


The main motivation for nested EZ is root + subdirs as per Andrew's comment. Is 
it such a big deal for an admin to set up EZ as he creates the directories in 
dirs? I think nested encryption will complicate things like volumes down the 
road and I don't think this extra complexity is necessary.
I will comment the volumes jira drive that discussion to a conclusion. 

> Support nested encryption zones
> ---
>
> Key: HDFS-9244
> URL: https://issues.apache.org/jira/browse/HDFS-9244
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Zhe Zhang
> Attachments: HDFS-9244.00.patch, HDFS-9244.01.patch
>
>
> This JIRA is opened to track adding support of nested encryption zone based 
> on [~andrew.wang]'s [comment 
> |https://issues.apache.org/jira/browse/HDFS-8747?focusedCommentId=14654141=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14654141]
>  for certain use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8747) Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption Zones

2016-01-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106889#comment-15106889
 ] 

Sanjay Radia edited comment on HDFS-8747 at 1/19/16 7:23 PM:
-

Posted on wrong jira by mistake (moving comment to HDFS-9244)
-The main motivation for nested EZ is root + subdirs as per Andrew's comment. 
Is it such a big deal for an admin to set up EZ as he creates the directories 
in dirs? I think nested encryption will complicate things like volumes down the 
road and I don't think this extra complexity is necessary.-


was (Author: sanjay.radia):
The main motivation for nested EZ is root + subdirs as per Andrew's comment. Is 
it such a big deal for an admin to set up EZ as he creates the directories in 
dirs? I think nested encryption will complicate things like volumes down the 
road and I don't think this extra complexity is necessary.

> Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption 
> Zones
> --
>
> Key: HDFS-8747
> URL: https://issues.apache.org/jira/browse/HDFS-8747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, 
> HDFS-8747-07292015.pdf
>
>
> HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to 
> allow create encryption zone on top of a single HDFS directory. Files under 
> the root directory of the encryption zone will be encrypted/decrypted 
> transparently upon HDFS client write or read operations. 
> Generally, it does not support rename(without data copying) across encryption 
> zones or between encryption zone and non-encryption zone because different 
> security settings of encryption zones. However, there are certain use cases 
> where efficient rename support is desired. This JIRA is to propose better 
> support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft 
> Delete” (a.k.a. trash) with HDFS encryption zones.
> “Scratch Space” is widely used in Hadoop jobs, which requires efficient 
> rename support. Temporary files from MR jobs are usually stored in staging 
> area outside encryption zone such as “/tmp” directory and then rename to 
> targeted directories as specified once the data is ready to be further 
> processed. 
> Below is a summary of supported/unsupported cases from latest Hadoop:
> * Rename within the encryption zone is supported
> * Rename the entire encryption zone by moving the root directory of the zone  
> is allowed.
> * Rename sub-directory/file from encryption zone to non-encryption zone is 
> not allowed.
> * Rename sub-directory/file from encryption zone A to encryption zone B is 
> not allowed.
> * Rename from non-encryption zone to encryption zone is not allowed.
> “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that 
> helps prevent accidental deletion of files and directories. If trash is 
> enabled and a file or directory is deleted using the Hadoop shell, the file 
> is moved to the .Trash directory of the user's home directory instead of 
> being deleted.  Deleted files are initially moved (renamed) to the Current 
> sub-directory of the .Trash directory with original path being preserved. 
> Files and directories in the trash can be restored simply by moving them to a 
> location outside the .Trash directory.
> Due to the limited rename support, delete sub-directory/file within 
> encryption zone with trash feature is not allowed. Client has to use 
> -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved 
> the error message but without a complete solution to the problem. 
> We propose to solve the problem by generalizing the mapping between 
> encryption zone and its underlying HDFS directories from 1:1 today to 1:N. 
> The encryption zone should allow non-overlapped directories such as scratch 
> space or soft delete "trash" locations to be added/removed dynamically after 
> creation. This way, rename for "scratch space" and "soft delete" can be 
> better supported without breaking the assumption that rename is only 
> supported "within the zone". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8747) Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption Zones

2016-01-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106889#comment-15106889
 ] 

Sanjay Radia commented on HDFS-8747:


The main motivation for nested EZ is root + subdirs as per Andrew's comment. Is 
it such a big deal for an admin to set up EZ as he creates the directories in 
dirs? I think nested encryption will complicate things like volumes down the 
road and I don't think this extra complexity is necessary.

> Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption 
> Zones
> --
>
> Key: HDFS-8747
> URL: https://issues.apache.org/jira/browse/HDFS-8747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, 
> HDFS-8747-07292015.pdf
>
>
> HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to 
> allow create encryption zone on top of a single HDFS directory. Files under 
> the root directory of the encryption zone will be encrypted/decrypted 
> transparently upon HDFS client write or read operations. 
> Generally, it does not support rename(without data copying) across encryption 
> zones or between encryption zone and non-encryption zone because different 
> security settings of encryption zones. However, there are certain use cases 
> where efficient rename support is desired. This JIRA is to propose better 
> support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft 
> Delete” (a.k.a. trash) with HDFS encryption zones.
> “Scratch Space” is widely used in Hadoop jobs, which requires efficient 
> rename support. Temporary files from MR jobs are usually stored in staging 
> area outside encryption zone such as “/tmp” directory and then rename to 
> targeted directories as specified once the data is ready to be further 
> processed. 
> Below is a summary of supported/unsupported cases from latest Hadoop:
> * Rename within the encryption zone is supported
> * Rename the entire encryption zone by moving the root directory of the zone  
> is allowed.
> * Rename sub-directory/file from encryption zone to non-encryption zone is 
> not allowed.
> * Rename sub-directory/file from encryption zone A to encryption zone B is 
> not allowed.
> * Rename from non-encryption zone to encryption zone is not allowed.
> “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that 
> helps prevent accidental deletion of files and directories. If trash is 
> enabled and a file or directory is deleted using the Hadoop shell, the file 
> is moved to the .Trash directory of the user's home directory instead of 
> being deleted.  Deleted files are initially moved (renamed) to the Current 
> sub-directory of the .Trash directory with original path being preserved. 
> Files and directories in the trash can be restored simply by moving them to a 
> location outside the .Trash directory.
> Due to the limited rename support, delete sub-directory/file within 
> encryption zone with trash feature is not allowed. Client has to use 
> -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved 
> the error message but without a complete solution to the problem. 
> We propose to solve the problem by generalizing the mapping between 
> encryption zone and its underlying HDFS directories from 1:1 today to 1:N. 
> The encryption zone should allow non-overlapped directories such as scratch 
> space or soft delete "trash" locations to be added/removed dynamically after 
> creation. This way, rename for "scratch space" and "soft delete" can be 
> better supported without breaking the assumption that rename is only 
> supported "within the zone". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8888) Support volumes in HDFS

2015-08-18 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702393#comment-14702393
 ] 

Sanjay Radia commented on HDFS-:


There are several motivations for introducing Volumes to HDFS.

Simplify management and implementation
* Volumes make  the management of some HDFS features simpler: Quotas, 
Encryption, Snapshots can become volume properties rather than properties of 
individual directories. As a unit of management, Volumes also offers strong 
isolations in the security settings.
* It can simplify  the implementation of some them. For example if we don’t 
allow renaming across a volume boundary then Snapshots’ implementation become 
easier.  Will customers accept this restriction? Won’t some apps like Hive have 
to change since they rename from temp to final destination? Recall we disallow 
renames across encryption zones and customers have found that acceptable. 
Further, we changed Hive to  deal with this restriction.
* Volumes can also simplify the management of datasets. For example one can 
associate different other policies for volumes. For example one can setup 
backup policies across DR zones based on volumes. 

Isn’t it  more flexible to have features like encryption, snapshots on 
arbitrary directories? Having a car with independent steering for each wheel is 
more flexible, but steering 2 wheels together makes a car easier to control. 
Volumes, while restricting the granularity, will simplify management and also 
the implementation.

*Relation to Federation*
How are volumes related to Federation? Currently in federation, each NN has a 
single volume. This Jira will allow each NN to have multiple volumes. Volumes 
adds to the Federation model. One can distribute/load balance volumes across 
NNs. Further it allows N+K failover especially when we add partial namespace 
caching (HDFS-). (More on this later.)

Other things to explore with Volumes (outside the scope of this Jira)
* Each volume could become its own RW lock with in the NN. This would improve 
parallelism within NN without much additional effort.
* Each volume could have its own image/journal to allow relocation of a volume 
to another NN (see federation).
* Associate storage policies with  a volume such as the volume is  backed by 
the same storage. The semantic allows new features like co-located data.





 Support volumes in HDFS
 ---

 Key: HDFS-
 URL: https://issues.apache.org/jira/browse/HDFS-
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai

 There are multiple types of zones (e.g., snapshottable directories, 
 encryption zones, directories with quotas) which are conceptually close to 
 namespace volumes in traditional file systems.
 This jira proposes to introduce the concept of volume to simplify the 
 implementation of snapshots and encryption zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8888) Support volumes in HDFS

2015-08-18 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702393#comment-14702393
 ] 

Sanjay Radia edited comment on HDFS- at 8/19/15 3:58 AM:
-

There are several motivations for introducing Volumes to HDFS.

Simplify management and implementation
* Volumes make  the management of some HDFS features simpler: Quotas, 
Encryption, Snapshots can become volume properties rather than properties of 
individual directories. As a unit of management, Volumes also offers strong 
isolations in the security settings.
* It can simplify  the implementation of some them. For example if we don’t 
allow renaming across a volume boundary then Snapshots’ implementation become 
easier.  Will customers accept this restriction? Won’t some apps like Hive have 
to change since they rename from temp to final destination? Recall we disallow 
renames across encryption zones and customers have found that acceptable. 
Further, we changed Hive to  deal with this restriction.
* Volumes can also simplify the management of datasets. For example one can 
associate different other policies for volumes. For example one can setup 
backup policies across DR zones based on volumes. 

Isn’t it  more flexible to have features like encryption, snapshots on 
arbitrary directories? Having a car with independent steering for each wheel is 
more flexible, but steering 2 wheels together makes a car easier to control. 
Volumes, while restricting the granularity, will simplify management and also 
the implementation.

*Relation to Federation*
How are volumes related to Federation? Currently in federation, each NN has a 
single volume. This Jira will allow each NN to have multiple volumes. Volumes 
adds to the Federation model. One can distribute/load balance volumes across 
NNs. Further it allows N+K failover especially when we add partial namespace 
caching (HDFS-). (More on this later.)

*Other things to explore with Volumes* (outside the scope of this Jira)
* Each volume could become its own RW lock with in the NN. This would improve 
parallelism within NN without much additional effort.
* Each volume could have its own image/journal to allow relocation of a volume 
to another NN (see federation).
* Associate storage policies with  a volume such as the volume is  backed by 
the same storage. The semantic allows new features like co-located data.






was (Author: sanjay.radia):
There are several motivations for introducing Volumes to HDFS.

Simplify management and implementation
* Volumes make  the management of some HDFS features simpler: Quotas, 
Encryption, Snapshots can become volume properties rather than properties of 
individual directories. As a unit of management, Volumes also offers strong 
isolations in the security settings.
* It can simplify  the implementation of some them. For example if we don’t 
allow renaming across a volume boundary then Snapshots’ implementation become 
easier.  Will customers accept this restriction? Won’t some apps like Hive have 
to change since they rename from temp to final destination? Recall we disallow 
renames across encryption zones and customers have found that acceptable. 
Further, we changed Hive to  deal with this restriction.
* Volumes can also simplify the management of datasets. For example one can 
associate different other policies for volumes. For example one can setup 
backup policies across DR zones based on volumes. 

Isn’t it  more flexible to have features like encryption, snapshots on 
arbitrary directories? Having a car with independent steering for each wheel is 
more flexible, but steering 2 wheels together makes a car easier to control. 
Volumes, while restricting the granularity, will simplify management and also 
the implementation.

*Relation to Federation*
How are volumes related to Federation? Currently in federation, each NN has a 
single volume. This Jira will allow each NN to have multiple volumes. Volumes 
adds to the Federation model. One can distribute/load balance volumes across 
NNs. Further it allows N+K failover especially when we add partial namespace 
caching (HDFS-). (More on this later.)

Other things to explore with Volumes (outside the scope of this Jira)
* Each volume could become its own RW lock with in the NN. This would improve 
parallelism within NN without much additional effort.
* Each volume could have its own image/journal to allow relocation of a volume 
to another NN (see federation).
* Associate storage policies with  a volume such as the volume is  backed by 
the same storage. The semantic allows new features like co-located data.





 Support volumes in HDFS
 ---

 Key: HDFS-
 URL: https://issues.apache.org/jira/browse/HDFS-
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai

 There 

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588681#comment-14588681
 ] 

Sanjay Radia commented on HDFS-7923:


bq. This change is really helpful during startup on big clusters. In the past 
we have seen restarting all the DNs at once on a several hundred node cluster 
bring the NN to its knees. 

There is already a random backoff for the initial block report. You can 
configure the  initial BR backoff time. When that jira was done there was a 
proposal to give each DN a different backoff time depending on the number of 
outstanding BRs; this enhancement was not done at that time because this 
backoff worked very well. For a several hundred node cluster the initial BR 
backoff time should be approx 60sec.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-16 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588931#comment-14588931
 ] 

Sanjay Radia commented on HDFS-7923:


Starvation  I didn't literally mean starvation but was more concerned about 
fairness and about safety. Can a DN's block report be delayed for some 
significant period of time or due to subtle bug even long times. Our current 
implementation is very resilient - DNs just sent the BRs at a specific period 
irrespective of the NN. Does your design have a safety net  - say a DN will 
wait a max of 2 periods to get permission (or something like that).

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587054#comment-14587054
 ] 

Sanjay Radia commented on HDFS-7923:


How will you ensure that a particular DN does not get staved? i.e. How do you 
guarantee that BRs will get through. HDFS depends on periodic BRs for 
correctness. I recall in  discussions with Facebook where they changed their 
HDFS for incremental BRs but still kept full BRs at a lower frequency just for 
safety.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-06-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587163#comment-14587163
 ] 

Sanjay Radia commented on HDFS-7923:


Have you considered a pull model (NN pulls) which does not risk starvation?

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, 
 HDFS-7923.006.patch, HDFS-7923.007.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8401) Memfs - a layered file system for in-memory storage in HDFS

2015-05-28 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564043#comment-14564043
 ] 

Sanjay Radia commented on HDFS-8401:


Consider the following use case:   one wants to run a few jobs and cache the 
input and the intermediate output just for the duration of these jobs. Today 
the user has to pin such data by changing the dir-file attributes, and when the 
jobs are finished he has to reset the attributes. It is easier to say jobxxx 
input = memfs://.../input tmp=memfs://.../tmpdir  output=. Here setting the 
scheme is not inconvenient since it is part of parameters to a program.  
Further this works with any existing application - Hive, Pig etc since the hint 
to cache is in the scheme of the pathname. Our existing policies and dir level 
setting work when things  are semi-permanent (ie this dir has dimension tables 
and please cache them - all jobs will benefit). In addition we could add or 
already have programmatic APIs to indicate that a file being read or written 
needs to be cached. But this requires change to the application code.   Once we 
get fully automated memory caching working we will not need  our existing 
storage policies nor layers like memfs since the system will just take care of 
it all - but it will take us some time to get there. 

I think both approaches have their own strengths and are complementary. Note  
spark-tachyon uses a layered file system and the approach is viewed as a simple 
way to control which files get cached on a per-job basis.

Further one can also cache specific Hive  tables in hive meta store by giving a 
path name that has the memfs-scheme. Here the memfs-pathname or setting the 
dirs attribute are roughly equal from a ease-of-usage perspective.

An additional point about memfs for non-hdfs systems: the Memfs *abstraction* 
allows caching S3 data in a very similar fashion. Of course one will have to 
build a full caching implementation of memfs for S3 because the memfs proposed 
in this Jira is very very thin layer over HDFS because ALL the caching 
mechanism is already in HDFS. So I expect several implementation of the memfs 
interface for HCFS file systems.

 Memfs - a layered file system for in-memory storage in HDFS
 ---

 Key: HDFS-8401
 URL: https://issues.apache.org/jira/browse/HDFS-8401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 We propose creating a layered filesystem that can provide in-memory storage 
 using existing features within HDFS. memfs will use lazy persist writes 
 introduced by HDFS-6581. For reads, memfs can use the Centralized Cache 
 Management feature introduced in HDFS-4949 to load hot data to memory.
 Paths in memfs and hdfs will correspond 1:1 so memfs will require no 
 additional metadata and it can be implemented entirely as a client-side 
 library.
 The advantage of a layered file system is that it requires little or no 
 changes to existing applications. e.g. Applications can use something like 
 {{memfs://}} instead of {{hdfs://}} for files targeted to memory storage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8241) Remove unused Namenode startup option FINALIZE

2015-04-27 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514487#comment-14514487
 ] 

Sanjay Radia commented on HDFS-8241:


Yes it was unfortunate that this incompatibility was missed.
Q. do folks feel that a startup -finalize option to NN is a good interface in 
ADDITION to  admin command  to finalize?
Clearly we needs the admin command to finalize since one does not want to 
restart the NN to finalize. 

 Remove unused Namenode startup option  FINALIZE
 -

 Key: HDFS-8241
 URL: https://issues.apache.org/jira/browse/HDFS-8241
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: HDFS-8241.patch


 Command : hdfs namenode -finalize
 15/04/24 22:26:23 INFO namenode.NameNode: createNameNode [-finalize]
  *Use of the argument 'FINALIZE' is no longer supported.*  To finalize an 
 upgrade, start the NN  and then run `hdfs dfsadmin -finalizeUpgrade'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8075) Revist layout version

2015-04-07 Thread Sanjay Radia (JIRA)
Sanjay Radia created HDFS-8075:
--

 Summary: Revist layout version
 Key: HDFS-8075
 URL: https://issues.apache.org/jira/browse/HDFS-8075
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Sanjay Radia


Background
* HDFS image layout was changed to use Protobufs to allow easier forward and 
backward compatibility.
* Hdfs has a layout version which is changed on each change (even if it an  
optional protobuf field was added).
* Hadoop supports two ways of going back during an upgrade:
**  downgrade: go back to old binary version but use existing image/edits so 
that newly created files are not lost
** rollback: go back to checkpoint created before upgrade was started - hence 
newly created files are lost.

Layout needs to be revisited if we want to support downgrade is some 
circumstances which we dont today. Here are use cases:
* Some changes can support downgrade even though they was a change in layout 
since there is not real data loss but only loss of new functionality. E.g. when 
we added ACLs one could have downgraded - there is no data loss but you will 
lose the newly created ACLs. That is acceptable for a user since one does not 
expect to retain the newly added ACLs in an old version.
* Some changes may lead to data-loss if the functionality was used. For 
example, the recent truncate will cause data loss if the functionality was 
actually used. Now one can tell admins NOT use such new such new features till 
the upgrade is finalized in which case one could potentially support downgrade.
* A fairly fundamental change to layout where a downgrade is not possible but a 
rollback is. Say we change the layout completely from protobuf to something 
else. Another example is when HDFS moves to support partial namespace in memory 
- they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8075) Revist layout version

2015-04-07 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia resolved HDFS-8075.

Resolution: Duplicate

Closed: duplicate of HDFS-5223

 Revist layout version
 -

 Key: HDFS-8075
 URL: https://issues.apache.org/jira/browse/HDFS-8075
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Sanjay Radia

 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483654#comment-14483654
 ] 

Sanjay Radia commented on HDFS-5223:


For the edits one could require that in order to downgrade you must do a 
save-image and then delete the null edits-log. We would then limit our 
solution to the image. For the image we could do the following
* Add a *second* layout version field (call it compatible-layout-version) 
that indicates which version can safely read the image without data-loss. A NN 
that starts up will compare this field with its current layout version and then 
proceed as long as the edits is null.
** The ACL example (see Jira description) will state that the previous version  
can safely read the image without data loss. Of course newly created ACLs would 
be lost.
** Truncate example is tricky: one can safely downgrade if the truncate 
operation was not used. We could add code to not allow such new features till 
finalize is done.  This is somewhat analogous to what ext3 was trying to do 
with  its superblock feature flags (see Todd's comment above); what I am 
proposing is slightly different since it limits such features till upgrade is 
finalized while ext3's approach is more general in that you can downgrade at 
anytime as long as you have used the feature.  Alternatively, we could simply 
not support downgrade for such a feature and simply mark the 
compatible-layout-version accordingly.





 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed.
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8075) Revist layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483602#comment-14483602
 ] 

Sanjay Radia commented on HDFS-8075:


Here is proposal:
* Add a *second* layout version field (call it compatible-layout-version) 
that states which version can safely read the image without data-loss.
** The ACL example will state that the previous version  can safely read the 
image without data loss.
** Truncate example is tricky: one can safely downgrade if the truncate 
operation was not used. We could add code to not allow such new features till 
finalize is done. Or we could say don't support downgrade for such a feature 
and simply mark the compatible-layout-version accordingly.



 Revist layout version
 -

 Key: HDFS-8075
 URL: https://issues.apache.org/jira/browse/HDFS-8075
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Sanjay Radia

 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483654#comment-14483654
 ] 

Sanjay Radia edited comment on HDFS-5223 at 4/7/15 6:29 PM:


For the edits one could require that in order to downgrade you must do a 
save-image and then delete the null edits-log. We would then limit our 
solution to the image. For the image we could do the following:

Add a *second* layout version field (call it compatible-layout-version) that 
indicates which version can safely read the image without data-loss. A NN that 
starts up will compare this field with its current layout version and then 
proceed as long as the edits is null.
* The ACL example (see Jira description) will state that the previous version  
can safely read the image without data loss. Of course newly created ACLs would 
be lost.
* Truncate example is tricky: one can safely downgrade if the truncate 
operation was not used. We could add code to not allow such new features till 
finalize is done.  This is somewhat analogous to what ext3 was trying to do 
with  its superblock feature flags (see Todd's comment above); what I am 
proposing is slightly different since it limits such features till upgrade is 
finalized while ext3's approach is more general in that you can downgrade at 
anytime as long as you have used the feature.  We could also do the following 
slight variation if finalize seems too arbitrary: once you use the new 
feature (e.g, truncate) simply change the complatible-layout-version to be 
the current one and this will safely prevent older binaries from reading it. 
Of course alternatively, we could simply not support downgrade for such a 
feature and simply mark the compatible-layout-version accordingly.






was (Author: sanjay.radia):
For the edits one could require that in order to downgrade you must do a 
save-image and then delete the null edits-log. We would then limit our 
solution to the image. For the image we could do the following
* Add a *second* layout version field (call it compatible-layout-version) 
that indicates which version can safely read the image without data-loss. A NN 
that starts up will compare this field with its current layout version and then 
proceed as long as the edits is null.
** The ACL example (see Jira description) will state that the previous version  
can safely read the image without data loss. Of course newly created ACLs would 
be lost.
** Truncate example is tricky: one can safely downgrade if the truncate 
operation was not used. We could add code to not allow such new features till 
finalize is done.  This is somewhat analogous to what ext3 was trying to do 
with  its superblock feature flags (see Todd's comment above); what I am 
proposing is slightly different since it limits such features till upgrade is 
finalized while ext3's approach is more general in that you can downgrade at 
anytime as long as you have used the feature.  Alternatively, we could simply 
not support downgrade for such a feature and simply mark the 
compatible-layout-version accordingly.





 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed.
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the 

[jira] [Updated] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-5223:
---
Description: 
Currently all HDFS on-disk formats are version by the single layout version. 
This means that even for changes which might be backward compatible, like the 
addition of a new edit log op code, we must go through the full `namenode 
-upgrade' process which requires coordination with DNs, etc. HDFS should 
support a lighter weight alternative.


Copied description from HDFS-8075 which is a duplicate and now closed.
Background
* HDFS image layout was changed to use Protobufs to allow easier forward and 
backward compatibility.
* Hdfs has a layout version which is changed on each change (even if it an  
optional protobuf field was added).
* Hadoop supports two ways of going back during an upgrade:
**  downgrade: go back to old binary version but use existing image/edits so 
that newly created files are not lost
** rollback: go back to checkpoint created before upgrade was started - hence 
newly created files are lost.

Layout needs to be revisited if we want to support downgrade is some 
circumstances which we dont today. Here are use cases:
* Some changes can support downgrade even though they was a change in layout 
since there is not real data loss but only loss of new functionality. E.g. when 
we added ACLs one could have downgraded - there is no data loss but you will 
lose the newly created ACLs. That is acceptable for a user since one does not 
expect to retain the newly added ACLs in an old version.
* Some changes may lead to data-loss if the functionality was used. For 
example, the recent truncate will cause data loss if the functionality was 
actually used. Now one can tell admins NOT use such new such new features till 
the upgrade is finalized in which case one could potentially support downgrade.
* A fairly fundamental change to layout where a downgrade is not possible but a 
rollback is. Say we change the layout completely from protobuf to something 
else. Another example is when HDFS moves to support partial namespace in memory 
- they is likely to be a fairly fundamental change in layout.

  was:Currently all HDFS on-disk formats are version by the single layout 
version. This means that even for changes which might be backward compatible, 
like the addition of a new edit log op code, we must go through the full 
`namenode -upgrade' process which requires coordination with DNs, etc. HDFS 
should support a lighter weight alternative.


 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed.
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from 

[jira] [Commented] (HDFS-8075) Revist layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483606#comment-14483606
 ] 

Sanjay Radia commented on HDFS-8075:


Oops. I will close this as duplicate. I will  copy the description of this Jira 
to the HDFS-5223 since it has some good examples.

 Revist layout version
 -

 Key: HDFS-8075
 URL: https://issues.apache.org/jira/browse/HDFS-8075
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Sanjay Radia

 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483704#comment-14483704
 ] 

Sanjay Radia commented on HDFS-5223:


The above solution was inspired by Hive's ORC. They have two complementary 
mechanisms to address dealing with old and new binaries. They specify the 
oldest version that can safely read the new data (which inspired the solution i 
gave above) and also new binaries can write in older format. This second 
mechanim is too burdensome for HDFS. Instead I would prefer to disable the new 
new features after which one cannot downgrade. 

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed.
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6200) Create a separate jar for hdfs-client

2015-03-03 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345657#comment-14345657
 ] 

Sanjay Radia commented on HDFS-6200:


+++1 for this proposal.

 Create a separate jar for hdfs-client
 -

 Key: HDFS-6200
 URL: https://issues.apache.org/jira/browse/HDFS-6200
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-6200.000.patch, HDFS-6200.001.patch, 
 HDFS-6200.002.patch, HDFS-6200.003.patch, HDFS-6200.004.patch, 
 HDFS-6200.005.patch, HDFS-6200.006.patch, HDFS-6200.007.patch


 Currently the hadoop-hdfs jar contain both the hdfs server and the hdfs 
 client. As discussed in the hdfs-dev mailing list 
 (http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201404.mbox/browser),
  downstream projects are forced to bring in additional dependency in order to 
 access hdfs. The additional dependency sometimes can be difficult to manage 
 for projects like Apache Falcon and Apache Oozie.
 This jira proposes to create a new project, hadoop-hdfs-cliient, which 
 contains the client side of the hdfs code. Downstream projects can use this 
 jar instead of the hadoop-hdfs to avoid unnecessary dependency.
 Note that it does not break the compatibility of downstream projects. This is 
 because old downstream projects implicitly depend on hadoop-hdfs-client 
 through the hadoop-hdfs jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7745) HDFS should have its own daemon command and not rely on the one in common

2015-02-06 Thread Sanjay Radia (JIRA)
Sanjay Radia created HDFS-7745:
--

 Summary: HDFS should have its own daemon command  and not rely on 
the one in common
 Key: HDFS-7745
 URL: https://issues.apache.org/jira/browse/HDFS-7745
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sanjay Radia


HDFS should have its own daemon command and not rely on the one in common.  BTW 
Yarn split out its own daemon command during project split. Note the 
hdfs-command does have --daemon flag and hence the daemon script is merely a 
wrapper. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-08-26 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111655#comment-14111655
 ] 

Sanjay Radia commented on HDFS-6469:


My thoughts:
* I do believe that Paxos based NN would give faster failover than what NN HA 
offers today (30sec to a few minutes but typically no more than 1 minute or 
two). So this is clearly a benefit of CNode though I have not heard a single 
customer complain about the failover time so far. 
* The proposed solution does not increase the write throughput. 
* The  parallel reads advantage of CNode  can be achieved in the current HA 
setup with some work (this is discussed above). If this is the main benefit 
than I rather pursue enhancing the NN standby to support reads. Further there 
is existing on going work to improve the locking in the NN.
* I share Todd's view that  ZK is not a usable  reference implementation for 
Paxos. One really needs a paxos library that can be plugged in rather than an 
external server-based solution like ZK.  

So at this stage I am having a hard time  seeing the benefits to justify the 
costs of adding this complexity.  I do however understand the overhead that 
Wandisco faces in integrating their solution with HDFS each time HDFS is 
modified. Would a few plugin interfaces make it easier?  I would be more than 
happy to support adding such plugins if they would help.

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099243#comment-14099243
 ] 

Sanjay Radia commented on HDFS-6134:


We have made very good progress over the last few days. Thanks for taking the 
time for the offline technical discussions.  Below is a  summary of   the 
concerns I have raised previously in this Jira.
# Fix distcp and cp to *automatically* deal with EZ  using /r/r internally. 
Initially   we  need to support only  row 1 and row 4 in the table I attached  
in Hadoop-10919
# Fix Webhdfs to use KMS delegation tokens so that webhdfs can be used with 
transparent encryption  without giving user hdfs KMS proxy permission (and as 
a result to admins). Rest is a key  protocol for HDFS and for many Hadoop use 
cases, an Admin should not have access to the keys of  encrypted files.
# Further work on specifying what HAR should do (I have listed some use cases 
and proposed solutions ), and then follow it up with a fix to har.
# Some work on understanding availability and scalability on KMS for medium to 
large clusters. Perhaps we need to explore getting the keys ahead of time when 
a job is submitted.

Lets complete Items 1 and 2 promptly. Before we publish transparent encryption 
in a   2.x  release for pubic consumption, let us at  least complete item 1 (ie 
distcp and cp) and the flag to turn this feature on/of.



 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099311#comment-14099311
 ] 

Sanjay Radia commented on HDFS-6134:


Alejandro.  Wrt to the subtle difference between webhfs vs httpfs, can an admin 
grab the EDEKs and raw files and then log into the httpfs machine become user 
httpfs and then trick the KMS to  decrypt the keys because httpfs has proxy 
setting?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096614#comment-14096614
 ] 

Sanjay Radia commented on HDFS-6134:


I get your point about client-side code for webhdfs. I do agree that   httpfs 
is a proxy but do you want it to have blanket access to all keys?

My main concern is that this jira completely breaks webhdfs. Do you find that 
acceptable?? There are so many users of this protocol.
BTW did you see my earlier attempt at the solution (13.06 today) - does that 
work?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096617#comment-14096617
 ] 

Sanjay Radia commented on HDFS-6134:


Alejandro, can you please summarize your explanation for why during file 
creation, NN requests the KMS to create a new EDEK rather then having the 
client do it. Suresh raised the same concern that I did at our meeting 
yesterday. Thanks

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097330#comment-14097330
 ] 

Sanjay Radia commented on HDFS-6134:


Had a chat with Owen over the wehbhdfs issue and the solution I had proposed in 
[comment | 
https://issues.apache.org/jira/browse/HDFS-6134?focusedCommentId=14096027page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14096027].
 He said that restricting the client connections from user hdfs are not 
necessary: the DN does a doAs(user) . KMS is configured for hdfs to be proxy 
but it also blacklists hdfs (and other superusers). That is the DN as a proxy 
cannot get a key for hdfs but it can get the keys for other users. So this 
brings the httpfs and webhdfs solutions to be the same.

Owen proposed another solution where the  httpfs or DN daemons do *not* need to 
be trusted proxies for the KMS. The user simply passes a KMS delegation token 
in the REST request (we already pass HDFS delegation tokens). 



 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097610#comment-14097610
 ] 

Sanjay Radia commented on HDFS-6134:


Context: making things work for cp, distcp, har, etc.
Is the following true:
 the EZ master key (EZKey) is only needed for file creation in EZ subtree. 
After that for reading  or appending to a file, one simple needs the file's 
individual key. If that is true then one can copy raw encrypted files and their 
keys from an EZ to tape, har, tar, etc  and then restore them later and  things 
would just work. Also can one copy raw encrypted files and their keys from an 
EZ  to another EZ which has a different EZKey and again things would work?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097687#comment-14097687
 ] 

Sanjay Radia commented on HDFS-6134:


Soem thoughts on the Har  use cases and possible outcomes:
 1) Har a subtree and the subtree contains an EZ.
 2) Har a subtree rooted at the EZ
 3) Har a subtree within an EZ
Typically the subtree is replaced by the har itself, though not required. The 
Har is read only.
The operation can be performed by an admin or by a user.

Use case 1 - copy the raw files and the keys into the HAR (ie the files inside 
the HAR remain encrypted). When files are accessed from the Har filesystem the 
same machinery as for HDFS EZ should come to play to allow transparent 
decryption of the files. A user with no KMS permission will not be able to 
decrypt. Someone with read access to the HAR will be able to get to the raw 
files and their keys (how does this compare to the normal HDFS EZ?)
Use case 2 - same as 1.
Use case 3.  If the har is copied elsewhere (ie it does not replace the 
subtree) then same as 1. If it does replace subtree the HAR will be encrypted 
once again (ie double encryption). 


 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption

2014-08-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097687#comment-14097687
 ] 

Sanjay Radia edited comment on HDFS-6134 at 8/14/14 9:23 PM:
-

Some thoughts on the Har  use cases and possible outcomes:
 1) Har a subtree and the subtree contains an EZ.
 2) Har a subtree rooted at the EZ
 3) Har a subtree within an EZ
Typically the subtree is replaced by the har itself, though not required. The 
Har is read only.
The operation can be performed by an admin or by a user.

Use case 1 - copy the raw files and the keys into the HAR (ie the files inside 
the HAR remain encrypted). When files are accessed from the Har filesystem the 
same machinery as for HDFS EZ should come to play to allow transparent 
decryption of the files. A user with no KMS permission will not be able to 
decrypt. Someone with read access to the HAR will be able to get to the raw 
files and their keys (how does this compare to the normal HDFS EZ?)
Use case 2 - same as 1.
Use case 3.  If the har is copied elsewhere (ie it does not replace the 
subtree) then same as 1. If it does replace subtree the HAR will be encrypted 
once again (ie double encryption). 



was (Author: sanjay.radia):
Soem thoughts on the Har  use cases and possible outcomes:
 1) Har a subtree and the subtree contains an EZ.
 2) Har a subtree rooted at the EZ
 3) Har a subtree within an EZ
Typically the subtree is replaced by the har itself, though not required. The 
Har is read only.
The operation can be performed by an admin or by a user.

Use case 1 - copy the raw files and the keys into the HAR (ie the files inside 
the HAR remain encrypted). When files are accessed from the Har filesystem the 
same machinery as for HDFS EZ should come to play to allow transparent 
decryption of the files. A user with no KMS permission will not be able to 
decrypt. Someone with read access to the HAR will be able to get to the raw 
files and their keys (how does this compare to the normal HDFS EZ?)
Use case 2 - same as 1.
Use case 3.  If the har is copied elsewhere (ie it does not replace the 
subtree) then same as 1. If it does replace subtree the HAR will be encrypted 
once again (ie double encryption). 


 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-14 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097875#comment-14097875
 ] 

Sanjay Radia commented on HDFS-6134:


bq. Had a chat with Owen over the wehbhdfs issue and the solution I had 
proposed in comment . He said that restricting the client connections from user 
hdfs are not necessary: the DN does a doAs(user) . KMS is configured for hdfs 
to be proxy but it also blacklists hdfs (and other superusers). That is the DN 
as a proxy cannot get a key for hdfs but it can get the keys for other users. 
So this brings the httpfs and webhdfs solutions to be the same.

The above does not work: an admin can login in as hdfs and then pretend to be 
the NN/DN and use the proxy privilege to get DEKs from EDEKs (an admin can read 
EDEKs easily). (Alejandro - thanks for the explanation -  i finally get the 
distinction between webhdfs and httpfs) 

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095204#comment-14095204
 ] 

Sanjay Radia commented on HDFS-6134:


Larry I don't completely get the difference between webhdfs and httpfs but I 
think the cause of the difference is that user hdfs is superuser (note DN runs 
as hdfs and  webhdfs code is executed on behalf of the end-user inside the DN 
after checking the permissions), Hence I think  this would potentially open up 
access to all encrypted files that are readable. However that should NOT happen 
if doAs is used (correct?). 

I agree it would be unacceptable to say that if one enables transparent 
encryption then one should disable webhdfs because it would become insecure, 
Andrew  say that Regarding webhdfs, it's not a recommended deployment but 
Aljeandro  say Both httpfs and webhdfs will work just fine  but then in the 
same paragraph says this could fail some security audits.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027
 ] 

Sanjay Radia commented on HDFS-6134:


Alejandro, if we treat user hdfs as a special user such that the  HDFS system 
will not accept any client connections from  hdfs then does this solve this 
problem?. An Admin will not be able to connect as user hdfs but can connect 
as user ClarkKent where  ClarkKent is in the superuser group of hdfs so 
that the admin can do his job as superuser.  It does means that we are trusting 
the HDFS code to be correct in not abusing its access to keys since it has 
proxy authority with KMS (this was not required so far.)

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027
 ] 

Sanjay Radia edited comment on HDFS-6134 at 8/13/14 8:19 PM:
-

Alejandro, a potential solution:  treat user hdfs as a special user such that 
the  HDFS system will NOT accept any client connections from  hdfs. An Admin 
will not be able to connect as user hdfs but can connect as user, say,  
ClarkKent where  ClarkKent is in the superuser group of hdfs so that the 
admin can do his job as superuser.  It does means that we are trusting the HDFS 
code to be correct in not abusing its access to keys since it has proxy 
authority with KMS (this was not required so far.)


was (Author: sanjay.radia):
Alejandro, if we treat user hdfs as a special user such that the  HDFS system 
will not accept any client connections from  hdfs then does this solve this 
problem?. An Admin will not be able to connect as user hdfs but can connect 
as user ClarkKent where  ClarkKent is in the superuser group of hdfs so 
that the admin can do his job as superuser.  It does means that we are trusting 
the HDFS code to be correct in not abusing its access to keys since it has 
proxy authority with KMS (this was not required so far.)

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-13 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096570#comment-14096570
 ] 

Sanjay Radia commented on HDFS-6134:


bq. If you set up httpfs, it runs using the 'httpfs' user, a HDFS regular user 
configured as proxyuser to interact with HDFS and KMS doing doAs calls
Alejandro , we modified the original design in this Jira so that the NN is not 
a proxy for the keys but instead the client get the keys directly from the KMS 
because  the best practice in encryption  is to eliminate  proxies (see Owen's 
comment of June 11).  With your proposal for  httpfs, the httpfs server is a 
proxy to  get the keys. Perhaps we are approaching the problem wrong. Consider 
the following alternative: let webhdfs and httpfs simply send the encrypted raw 
data to the client. For  the hdfs-native filesystem,  the encryption and 
decryption happens on the client side;  we should consider the same for the 
rest protocol. Clearly it requires more code on the rest client side.

BTW the webhdfs-fileSystem (as opposed to the  rest protocol that is discussed 
about) has a client side library that can mimic the  hdfs filesystem's client 
side.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094485#comment-14094485
 ] 

Sanjay Radia commented on HDFS-6134:


bq. Regarding webhdfs, it's not a recommended deployment. 
The design document in this jira already states that webhdfs just works:
* This Jira provides encryption for HDFS data at rest and allows any 
application to access it via the Hadoop Filesystem Java API, Hadoop libhdfs C 
library, or WebHDFS REST API.
* For HDFS WebHDFS, the DataNodes act as the HDFS client reading/writing files 
since that is where encryption/decryption will happen. For HttpFS, the HttpFS 
server acts as the HDFS client reading/writing files, since that is where 
encryption/decryption will happen.

 webhdfs not working is worrying because REST is used by many users who do not 
want to deploy hadoop binaries or want to use a non-java client.
Also I do not understand why  httpfs works and webhdfs breaks. Neither will 
be running as the end-user and hence neither will allow transparent encryption. 
Am I missing something?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094527#comment-14094527
 ] 

Sanjay Radia commented on HDFS-6134:


bq. Regarding HAR, could you lay out the usecase ...
Alejandro summarize the problem and also the solution of modifying har in his 
comment of June 24th  
https://issues.apache.org/jira/browse/HDFS-6134?focusedCommentId=14042797page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14042797
Andrew you are missing one of the usage models of HAR: The user creating the 
har is not the only user accessing the har - har is a general tool used by an 
admin to compact files and replace the original.

I can think of at least the following  use cases so far :
* A subtree being har'ed has subtree that is EZ - some files in the har will be 
encrypted and some will not. The reader should be able to transparently read 
each of the two kinds 
* A subtree being har'ed is part of subtree that is EZ  - the whole har should 
be encrypted and transparently decrypted when its contents are read.
* A user har's a non-EZ subtree and copies it into a EZ  - should just work as 
you suggest the whole thing is encrypted and requires that the user has access 
to the keys to read the har.



 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094613#comment-14094613
 ] 

Sanjay Radia commented on HDFS-6134:


Alejandro - for both webhdfs and httpfs to work your proposal is that users 
hdfs and httpfs have access to any key (you mention only webhdfs in your 
comment but I suspect you meant both). However with this approach webhdfs and 
httpfs will then all access to ALL EZ files to users that have read access.
Correct? This would be unacceptable.

I believe the better solution is for webhdfs and httpfs to access the file by 
doing a doAs(endUser). 

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption

2014-08-12 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094613#comment-14094613
 ] 

Sanjay Radia edited comment on HDFS-6134 at 8/13/14 4:58 AM:
-

Alejandro - for both webhdfs and httpfs to work your proposal is that users 
hdfs and httpfs have access to any key (you mention only webhdfs in your 
comment but I suspect you meant both). However with this approach webhdfs and 
httpfs will give access to ALL EZ files to users that have read access.
Correct? This would be unacceptable.

I believe the better solution is for webhdfs and httpfs to access the file by 
doing a doAs(endUser). 


was (Author: sanjay.radia):
Alejandro - for both webhdfs and httpfs to work your proposal is that users 
hdfs and httpfs have access to any key (you mention only webhdfs in your 
comment but I suspect you meant both). However with this approach webhdfs and 
httpfs will then all access to ALL EZ files to users that have read access.
Correct? This would be unacceptable.

I believe the better solution is for webhdfs and httpfs to access the file by 
doing a doAs(endUser). 

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption

2014-08-11 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093538#comment-14093538
 ] 

Sanjay Radia edited comment on HDFS-6134 at 8/12/14 12:19 AM:
--

bq. Charles posted a design doc for how distcp will work with encryption at 
HDFS-6509.
 I did a quick glance over it. We also need to do the same for har. I think the 
same .raw should work ...


was (Author: sanjay.radia):
.bq. Charles posted a design doc for how distcp will work with encryption at 
HDFS-6509.
 I did a quick glance over it. We also need to do the same for har. I think the 
same .raw should work ...

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-11 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093538#comment-14093538
 ] 

Sanjay Radia commented on HDFS-6134:


.bq. Charles posted a design doc for how distcp will work with encryption at 
HDFS-6509.
 I did a quick glance over it. We also need to do the same for har. I think the 
same .raw should work ...

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-11 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093579#comment-14093579
 ] 

Sanjay Radia commented on HDFS-6134:


Wrt to webhdfs, the document says that the decryption/encryption will happen in 
the Datanode. 
*  Will the DN be able to access the key necessary to do this?
* The data will be transmitted in the clear - is that what we want? For the 
normal HDFS API the decryption/encryption happens at the client side.
* There are two aspects to Webhdfs: the rest client and the webhdfs Filesystem. 
Have you considered both use cases?
* Will distcp work via webhdfs? Customers often use  webhdfs instead of hdfs 
for cross-cluster copies.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-08-11 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093583#comment-14093583
 ] 

Sanjay Radia commented on HDFS-6134:


One of the items raised at the meeting and summarized by Owen in his meeting 
minutes comment (june 26) is the scalability concern. How is that being 
addressed? Can a  job client get the keys prior to job submission?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 3.0.0, 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Charles Lamb
 Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, 
 HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, 
 HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-07-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062502#comment-14062502
 ] 

Sanjay Radia commented on HDFS-6469:


bq. I meant that if you use QJM then every update on the NameNode results in 
writing into two journals: first into edits log and then into QJM journal. 
Konstantine, HDFS has supported parallel journals (ie multiple editlogs for a 
long time.) that are written in parallel. A customer can use just QJM (which 
gives at least 3 replicas) and can optionally have a local parallel editlog if 
they want additional redundancy. What you are proposing is dual *serial* 
journals. 

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-07-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062559#comment-14062559
 ] 

Sanjay Radia commented on HDFS-6469:


Todd said:
bq. a fully usable solution would be available to the community at large, 
whereas the design you're proposing seems like it will only be usably 
implemented by a proprietary extension (I don't consider the ZK reference 
implementation likely to actually work in a usable fashion).

Konstanine I had mentioned exactly the above point to you at the Hadoop summit 
Europe.  ZK is a coordination service and for this to be practical it needs to 
be an inline Paxos protocol. We had also discussed 2 potential  paxos libraries 
 that could come into open source: I believe Facebook has one that they may 
contribute and CMU has one called E-Paxos; if either of these become available 
then it addresses this particular issue. I have no objections to a customer 
going to Wandisco for the enterprise supported  version, but if the community 
is going to maintain such an extension then there needs to a practical, 
in-production-usable  free solution; sending offline messages to a coordinator 
service  for each transaction is not usable. Lets discuss the performance part 
in a separate comment. Let me comment on your comparisons to  the topology and 
windows examples that the community supported in the past:
* Topology - these changes allowed Hadoop to be used on containers such as VMs. 
** Both KVM and VirtualBox offer free VM solutions - the customer does not need 
to buy ESX.  
** The topology solution would will also help with a Docker container 
deployment which is freely available and offers even better performance than 
VMs. 
** Hadoop is commonly used in cloud environment (e.g. AWS, or Azure, or 
Altiscale) which all use VMs or containers
** Further, it was recognized that while, in the past, we had considered racks 
to be a failure zone, that there could be other failure zones: nodes (for the 
case of VMs or containers on a host) and also groups of machines.
* Windows - this was done for platform support which is very different than 
what we are talking about here; many open source solutions support multiple 
platforms to enable the widest adoption. BTW Hadoop supported windows via 
cygwin but we made it first class since the initial support via cygwin was 
messy. 

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-07-01 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049227#comment-14049227
 ] 

Sanjay Radia commented on HDFS-6469:


Wrt to double journaling
bq. If I follow your logic correctly QJM being Paxos-based uses a journal by 
itself, so we are not increasing journaling here. When you look at the bigger 
picture we see more journals around. HBase uses WAL along with NN edits, which 
by itself persisted in ext4 a journaling file system.

Konstatine, Todd's point is not that there are multiple journal in the system 
but that every update operation of NN, will result in an entry in two journals: 
HDFS's edit log and the journal used  by the consensusNode  paxos protocol.  
Your example of HBases log and NN log is not a good comparison: every write to 
the HBase WAL does NOT result in a HDFS editlog entry - an entry is made in the 
HDFS editlog ONLY when the WAL crosses a block boundary. 

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-26 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045415#comment-14045415
 ] 

Sanjay Radia commented on HDFS-6134:


Noticed the rename restriction for encryption zone. In the past rename was one 
of the main objection to volumes (ie volumes should not restrict renames). I 
think we should bite the bullet and introduce the notion of volumes and use 
encryption as the first use case for volumes (ie encryption zone become 
encrypted volume). Snapshot can also benefit from volume-rename restriction 
because supporting rename across snapshots is very hard to support.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-24 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042527#comment-14042527
 ] 

Sanjay Radia commented on HDFS-6134:


bq. Can you be a bit more specific on HAR breaking?
Har copies subtree data into tar like structure. Har lets you access in the 
individual files transparently - all the work is done on the client side -- the 
NN is not involved and hence will  not be able to  hand out the encrypted keys 
or key versions. It is possible that Har can be changed work but I am merely 
pointing out that I don't think har will work as is with the changes proposed 
in this Jira.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-24 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042636#comment-14042636
 ] 

Sanjay Radia commented on HDFS-6134:


Alejandro - sorry I should have explained the HAR example better: consider a 
subtree which has a  file called E that is encrypted and the rest normal. Now 
the user decides to har the subtree. The file E needs to remain encrypted 
inside the har; also when E is accessed from the har it needs to be 
transparently unencrypted. BTW this might be fixable by changing Har.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-24 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042789#comment-14042789
 ] 

Sanjay Radia commented on HDFS-6134:


bq. The NN gives the HDFS client the encrypted DEK \[unique data encryption key 
of the file\] and the keyVersion ID
Alejandro - isn't it sufficient to hand out a keyname rather than the encrypted 
DEK?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040873#comment-14040873
 ] 

Sanjay Radia commented on HDFS-6134:


Aaron said:
bq. distcp...   I disagree - this is exactly what one wants ..
So you are saying that distcp should decrypt and re-encrypt data as it copies 
it ... most backup tools do not this as they copy data - it is extra CPU 
resources and further unneeded venerability. There are customer use cases where 
 distcp not over an encrypted channel; hence if one of the files being copied 
is encrypted one may not want the file to be transparently sent decrypted. 
Further, a sensitive file in a subtree may have been encrypted because the 
subtree is readable by a larger group and hence the distcp user may not have 
access to the keys. 

bq. delegation tokens - KMS ... Owen and Tucu have already discussed this quite 
a bit above
Turns out this issue come up in discussion with Owen, and he shares the concern 
and suggested that I  post the concern. Besides even if Alejandro and Owen are 
in agreement, my question is relevant and has not been raised so far above:  
Encryption is used to overcome limitations of authorization and authentication 
in the system. It is relevant to ask if the use of delegation tokens to obtain 
keys adds weakness. 

bq. meeting ...
Aaron .. you are misunderstanding my point. I am not saying that the discussion 
on this jira have not been open.
* See Alejandro's comments:   Todd Lipcon and I had an offline discussion with 
Andrew Purtell, Yi Liu and Avik Dey  and After some offline discussions with 
Yi, Tianyou, ATM, Todd, Andrew and Charles ...
** there have been such meetings and I have *no objections* to  such private 
meetings because I know that the bandwidth helps. I am merely asking for one 
more meeting where I can quickly come up to speed on the context that 
Alejandro, Todd, Yi, Tianyou, Andrew, Atm,  share. It will help me and others 
better understand the viewpoint that some of you share due to prevous high 
bandwidth meetings.

**  There is a precedent of HDFS meetings in spite of open jira discussion - 
higher bandwidth to progress faster.
**Perhaps I should have worded the private meetings differently ...  sorry if 
it came across the wrong way.


 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041429#comment-14041429
 ] 

Sanjay Radia commented on HDFS-6134:


I believe the transparent encryption will break the HAR file system.

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-23 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041449#comment-14041449
 ] 

Sanjay Radia commented on HDFS-6134:


bq. Vanilla distcp will just work with transparent encryption. Data will be 
decrypted on read and encrypted on write, assuming both source and target are 
in encrypted zones. ...The proposal on changing distcp is to enable a second 
use used case.
Alejandro, Aaron  the  general practice is not to give the admins running 
distcp  access to keys. Hence, as you suggest, we could change distcp so that 
it does not use transparent decryption  by default; however, there may be other 
such backup tools and applications that  customers and other vendors may have 
written and we would be breaking them. This may also break the HAR filesystem.

Aaron, you took on a very strong position that  transparent 
decryption/reencryption is is exactly what one wants. I am missing this - 
what are the use cases for distcp  where one wants transparent 
decryption/reencryption?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037737#comment-14037737
 ] 

Sanjay Radia commented on HDFS-6134:


bq. On the distcp not accessing the keys (not decrypting/encrypting), yes, that 
is the idea.
Alejandro not sure if I understand what you mean by the above. Are you saying 
that distcp and other tools/applications that copy and backup data will have be 
changed to do something different when the file is encrypted?
In a sense, this Jira's attempt to provide transparent encryption, is breaking 
existing transparency.

Two other questions:
* Are you relying in the  the kerberos credentials OR delegation tokens to 
obtain the keys? Isn't using the delegation token to obtain keys reducing 
security?
* Looks like the proposal relies on  file-ACLs to hand out keys - part of the  
motivation for using encryption is that ACLs are often correctly set. 

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataAtRestEncryption.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038099#comment-14038099
 ] 

Sanjay Radia commented on HDFS-6134:


* distcp and such tools and applications
bq. Vanilla distcp will just work with transparent encryption.
This is not what one wants - distcp will not necessarily have permission in 
decrypt.

* delegation tokens  - KMS will accept delegation tokens - again I don't think 
this is what one wants - can the keys be obtained at job submission time?

* File ACLs
bq. The NN gives the HDFS client the encrypted DEK and the keyVersion ID.
I assume the NN will hand this out based on the file ACL. Does the above reduce 
the security?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataAtRestEncryption.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-06-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038106#comment-14038106
 ] 

Sanjay Radia commented on HDFS-6134:


There are a complex set of issues to be addressed. I know that a bunch of you 
have had some private meetings discussing the various options and tradeoffs. 
Can we please have a short more public meeting next week? I can organize and 
host this at Hortonworks along with Google plus for those that are remote. How 
about next thursday at 1:30pm?

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataAtRestEncryption.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5851) Support memory as a storage medium

2014-06-18 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-5851:
---

Attachment: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf

Slightly updated doc on DDM:
* Clarified the separation of mechanism from DDM policy as per Colin's and my 
comment on being able to create memory cached file anywhere in the HDFS 
namespace.
 * Explained how DDMs fit  with materialized queries (julains DIMMQ).
 * Updated  references marked TBD
  * Minor improvements to the RDD/Tachyon comparison.
Note this text was written prior to Ali and Li's comment and hence does  
not address their concern. I am rereading the Tachyon paper and will be meeting 
Li over the next couple of days and will update further as needed.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-06-17 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034632#comment-14034632
 ] 

Sanjay Radia commented on HDFS-5851:


Colin wrote:
bq. why a separate namespace under hdfs://namespace/.reserved/ddm ? We have 
xattrs now, so files ...
I did not explain it well. It is a separation of policy and mechanism. HDFS has 
to support such files for ANY name. Hence we can use xattr to create files 
write cache.

The policy of managing the memory space and the underlying swap space (e.g. 
hdfs://namepace/.reserved/ddm) is separate from the write-cache mechanism that 
HDFS needs to support in ANY part of its namespace; so I believe we are in 
agreement here. I will explain the policy I am proposing in a separate comment.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5851) Support memory as a storage medium

2014-04-29 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984544#comment-13984544
 ] 

Sanjay Radia commented on HDFS-5851:


BTW we will host the meeting at Hortonworks for those that are local and want 
to attend in person:
Hortonworks
3460 W. Bayshore Rd
Palo Alto CA 94303


 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-5851) Support memory as a storage medium

2014-04-28 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981608#comment-13981608
 ] 

Sanjay Radia edited comment on HDFS-5851 at 4/28/14 10:23 PM:
--

Added comparison to Tachyon in the doc. The is also an implementation 
difference that I don't cover (Tachyon I believe uses RamFs rather than a 
memory that is mapped to a HDFS file -- but need to verify that).

I have reproduced the text from the updated doc here for convenience:
Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is 
outside the address space of an application and allows sharing RDDs across 
applications. Both Tachyon and DDMs use memory mapped files and lazy writing to 
reduce the need to recompute. Tachyon, since it is an RDD implementation, 
records the computation in order to regenerate the data in case of loss whereas 
DDMs relies on the application to regenerate. Tachyon and RDDs do not have a 
notion of discardability, which is fundamental to DDMs where data can be 
discarded when it is under memory and/or backing store pressure. DDMs are 
closest to virtual memory/anti-caching in that they virtualize memory, with the 
twist that data can be discarded.



was (Author: sanjay.radia):
Added comparison to Tachyon in the doc. The is also an implementation 
difference that I don't cover (Tachyon I believe uses RamFs rather than a 
memory that is mapped to a HDFS file -- but need to verify that).

I have reproduced the text from the updated doc here for convenience:
Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is 
outside the address space of an application and allows sharing RDDs across 
applications. Both Tachyon and DDMs use memory mapped files and lazy writing to 
reduce the need to recompute. Tachyon, since it is an RDD implementation, 
records the computation in order to regenerate the data in case of loss whereas 
DDMs relies on the application to regenerate. Tachyon and RDDs do not have a 
notion of discardability, which is fundamental to DDMs where data can be 
discarded when it is under memory and/or backing store pressure.


 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5851) Support memory as a storage medium

2014-04-25 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-5851:
---

Attachment: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf

Added comparison to Tachyon in the doc. The is also an implementation 
difference that I don't cover (Tachyon I believe uses RamFs rather than a 
memory that is mapped to a HDFS file -- but need to verify that).

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-5851) Support memory as a storage medium

2014-04-25 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981608#comment-13981608
 ] 

Sanjay Radia edited comment on HDFS-5851 at 4/25/14 9:10 PM:
-

Added comparison to Tachyon in the doc. The is also an implementation 
difference that I don't cover (Tachyon I believe uses RamFs rather than a 
memory that is mapped to a HDFS file -- but need to verify that).

I have reproduced the text from the updated doc here for convenience:
Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is 
outside the address space of an application and allows sharing RDDs across 
applications. Both Tachyon and DDMs use memory mapped files and lazy writing to 
reduce the need to recompute. Tachyon, since it is an RDD implementation, 
records the computation in order to regenerate the data in case of loss whereas 
DDMs relies on the application to regenerate. Tachyon and RDDs do not have a 
notion of discardability, which is fundamental to DDMs where data can be 
discarded when it is under memory and/or backing store pressure.



was (Author: sanjay.radia):
Added comparison to Tachyon in the doc. The is also an implementation 
difference that I don't cover (Tachyon I believe uses RamFs rather than a 
memory that is mapped to a HDFS file -- but need to verify that).

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5851) Support memory as a storage medium

2014-04-24 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-5851:
---

Attachment: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf

Please see attached document that identifies some use cases and a proposal for 
using memory for intermediate data. We introduce the notion of Discardable 
Distributed Memory (DDM) that exploit the property the data can be 
reconstructed. Further, by using HDFS files as a backing store to which DDM 
data is lazily written, we give the impression of much larger memory size and 
also give the system an additional degree of freedom to manage the scarce 
memory resource. The main implementation mechanism is memory­-mapped files that 
are lazily replicated; this mechanism provides weak­-persistence which may have 
other direct use cases beyond DDMs.

 Support memory as a storage medium
 --

 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: 
 SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf


 Memory can be used as a storage medium for smaller/transient files for fast 
 write throughput.
 More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6160) TestSafeMode occasionally fails

2014-04-09 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964250#comment-13964250
 ] 

Sanjay Radia commented on HDFS-6160:


+1

 TestSafeMode occasionally fails
 ---

 Key: HDFS-6160
 URL: https://issues.apache.org/jira/browse/HDFS-6160
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Arpit Agarwal
 Attachments: HDFS-6160.01.patch


 From 
 https://builds.apache.org/job/PreCommit-HDFS-Build/6511//testReport/org.apache.hadoop.hdfs/TestSafeMode/testInitializeReplQueuesEarly/
  :
 {code}
 java.lang.AssertionError: expected:13 but was:0
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5477) Block manager as a service

2014-03-04 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920076#comment-13920076
 ] 

Sanjay Radia commented on HDFS-5477:


I read both documents; some key details are missing (perhaps in the patches, 
but they need to be in the document or jira).
* How does the BM know the replication factor of a block? The document tends to 
suggest that the BM syncs with the NN on its start. Is this a sort of a 
reverse Block report from NN to BM where the NN tells the BM its list of 
blocks and their replication factor?
** In particular, one would like the BM's state to exist independently of the 
NN so that if one or more NNs are shut down for long periods of time (such as 
unmounting a namespace) then blocks and their replicas are still managed. Would 
this be possible under your design? If not what would it take to support it?
* How would one support file affinity? - for example there are several use 
cases (esp for HBase) where one co-locates file blocks replicas of multiple 
files together. How would you support such a feature?
* You briefly  reference  fine grained locking in the NN - does your design 
require that the NN holds certain fine grained locks as threads in the NN makes 
a remote call to the BM? Can you please detail these in context of specific 
operations in the document and/or jira.
* How do you plan to handle file and block-under construction? There is 
delicate code that finalizes sizes of block under construction especially in 
face of NN and DN restarts?
* Please describe how you would support the Balancer in your new design.
* Your design does not change client APIs and hence you must be forwarding 
block location requires via the NN to the BM. Can you describe how one could 
direct clients to go directly to BM for block locations when appropriate (I 
believe this can  done in API compatible way and also, with protocol changes, 
in a backward compatible way.)

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, 
 Standalone BM.pdf, patches.tar.gz


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS

2014-01-23 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880440#comment-13880440
 ] 

Sanjay Radia commented on HDFS-4685:


Comment on the the two alternatives for the default ACL proposals in the doc. 
Reproducing the text for convenience.

* *Umask-Default-ACL*: The default ACL of the parent is cloned to the ACL of 
the child at time of child creation.  For new child directories, the default 
ACL itself is also cloned, so that the same policy is applied to 
sub-directories of sub-directories.  Subsequent changes to the parent’s default 
ACL will set a different ACL for new children, but will not alter existing 
children.  This matches POSIX behavior.  If the administrator wants to change 
policy on the sub-tree later, then this is performed by inserting a new more 
restrictive ACL entry at the appropriate sub-tree root (see UC6) and may also 
need to  run a recursive ACL modification (analogous to chmod -R) since 
existing children are not effected by the new ACL.

* *Inherited-Default-ACL*: A child that does not have an ACL of its own 
inherits its ACL from the nearest ancestor that has defined a default ACL.  A 
child node that requires a different ACL can override the default (like the 
Umask-Default-ACL). Subsequent changes to the ancestor’s default ACL will cause 
all children that do not have an ACL to inherit the new ACL regardless of child 
creation time (unlike Umask-Default-ACL).   This model, like the ABAC ACLs (use 
case UC8), encourages the user to create fewer ACLs (typically on the root of 
specific subtrees) while the Posix-compliant Umask-Default-ACL is expected to 
results in larger number of ACLs in the system. It would also make a memory 
efficient implementation trivial. Note that this model is a deviation from 
POSIX behavior. 



Consider the following three sub use cases  here
4a) OpenUP child for wide access than the default.
4b) Restrict a child for narrower access than the default.
4c) Change the defaultAcl because you made a mistake originally.

Both models support use case 4a and 4b with equal ease. However, with the 
Inherited-Default-ACL, it is easy to identify children that have overridden the 
default-ACL - the existence of an ACL means that the user intended to override 
the default. Also 4c is a natural fit for Inherited-Default-ACL. For the 
UMask-Default-ACL, every child has an ACL and hence you have to walk down the 
subtree and compare the ACL with the default to see if the user had intended to 
override it.


I think the Inherited-Default-ACL is much better design but posix compliance 
may triumph and hence  am willing to go with UMask-Default-ACL.

 Implementation of ACLs in HDFS
 --

 Key: HDFS-4685
 URL: https://issues.apache.org/jira/browse/HDFS-4685
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client, namenode, security
Affects Versions: 1.1.2
Reporter: Sachin Jose
Assignee: Chris Nauroth
 Attachments: HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf


 Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be 
 achieved using getfacl and setfacl utilities. Is there anybody working on 
 this feature ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory

2014-01-03 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-5389:
---

Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-2362

 A Namenode that keeps only a part of the namespace in memory
 

 Key: HDFS-5389
 URL: https://issues.apache.org/jira/browse/HDFS-5389
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 0.23.1
Reporter: Lin Xiao
Priority: Minor

 *Background:*
 Currently, the NN Keeps all its namespace in memory. This has had the benefit 
 that the NN code is very simple and, more importantly, helps the NN scale to 
 over 4.5K machines with 60K  to 100K concurrently tasks.  HDFS namespace can 
 be scaled currently using more Ram on the NN and/or using Federation which 
 scales both namespace and performance. The current federation implementation 
 does not allow renames across volumes without data copying but there are 
 proposals to remove that limitation.
 *Motivation:*
  Hadoop lets customers store huge amounts of data at very economical prices 
 and hence allows customers to store their data for several years. While most 
 customers perform analytics on recent  data (last hour, day, week, months, 
 quarter, year), the ability to have five year old data online for analytics 
 is very attractive for many businesses. Although one can use larger RAM in a 
 NN and/or use Federation, it not really necessary to store the entire 
 namespace in memory since only the recent data is typically heavily accessed. 
 *Proposed Solution:*
 Store a portion of the NN's namespace in memory- the working set of the 
 applications that are currently operating. LSM data structures are quite 
 appropriate for maintaining the full namespace in memory. One choice is 
 Google's LevelDB open-source implementation.
 *Benefits:*
  *  Store larger namespaces without resorting to Federated namespace volumes.
  * Complementary to NN Federated namespace volumes,  indeed will allow a 
 single NN to easily store multiple larger volumes.
  *  Faster cold startup - the NN does not have read its full namespace before 
 responding to clients.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory

2014-01-03 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861940#comment-13861940
 ] 

Sanjay Radia commented on HDFS-5389:


Lin will shortly post a link to her prototype code on GitHub. Her prototype is 
based on HDFS 0.23.

 A Namenode that keeps only a part of the namespace in memory
 

 Key: HDFS-5389
 URL: https://issues.apache.org/jira/browse/HDFS-5389
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 0.23.1
Reporter: Lin Xiao
Priority: Minor

 *Background:*
 Currently, the NN Keeps all its namespace in memory. This has had the benefit 
 that the NN code is very simple and, more importantly, helps the NN scale to 
 over 4.5K machines with 60K  to 100K concurrently tasks.  HDFS namespace can 
 be scaled currently using more Ram on the NN and/or using Federation which 
 scales both namespace and performance. The current federation implementation 
 does not allow renames across volumes without data copying but there are 
 proposals to remove that limitation.
 *Motivation:*
  Hadoop lets customers store huge amounts of data at very economical prices 
 and hence allows customers to store their data for several years. While most 
 customers perform analytics on recent  data (last hour, day, week, months, 
 quarter, year), the ability to have five year old data online for analytics 
 is very attractive for many businesses. Although one can use larger RAM in a 
 NN and/or use Federation, it not really necessary to store the entire 
 namespace in memory since only the recent data is typically heavily accessed. 
 *Proposed Solution:*
 Store a portion of the NN's namespace in memory- the working set of the 
 applications that are currently operating. LSM data structures are quite 
 appropriate for maintaining the full namespace in memory. One choice is 
 Google's LevelDB open-source implementation.
 *Benefits:*
  *  Store larger namespaces without resorting to Federated namespace volumes.
  * Complementary to NN Federated namespace volumes,  indeed will allow a 
 single NN to easily store multiple larger volumes.
  *  Faster cold startup - the NN does not have read its full namespace before 
 responding to clients.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5711) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk.

2014-01-02 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860846#comment-13860846
 ] 

Sanjay Radia commented on HDFS-5711:


bq. We also intend to use LevelDB to persist metadata, and plan to provide a 
complete solution, by not just persisting the Namespace information but also 
the Blocks Map onto secondary storage. 

The namespace layer and block layer have been separated reasonably well as part 
of the federation work. Hence it makes sense to keep the persistence of the 
namespace and the persistence of the block map as two separate jiras. 
Since there  is already a Jira, HDFS-5389,  focusing on storing  a portion (the 
working set ) of the  namespace in memory, this jira should focus on the 
storing the block mapping on disk (as stated in the title).



 Removing memory limitation of the Namenode by persisting Block - Block 
 location mappings to disk.
 -

 Key: HDFS-5711
 URL: https://issues.apache.org/jira/browse/HDFS-5711
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Rohan Pasalkar

 This jira is to track changes to be made to remove HDFS name-node memory 
 limitation to hold block - block location mappings.
 It is a known fact that the single Name-node architecture of HDFS has 
 scalability limits. The HDFS federation project alleviates this problem by 
 using horizontal scaling. This helps increase the throughput of metadata 
 operation and also the amount of data that can be stored in a Hadoop cluster.
 The Name-node stores all the filesystem metadata in memory (even in the 
 federated architecture), the
 Name-node design can be enhanced by persisting part of the metadata onto 
 secondary storage and retaining 
 the popular or recently accessed metadata information in main memory. This 
 design can benefit a HDFS deployment
 which doesn't use federation but needs to store a large number of files or 
 large number of blocks. Lin Xiao from Hortonworks attempted a similar
 project [1] in the Summer of 2013. They used LevelDB to persist the Namespace 
 information (i.e file and directory inode information).
 A patch with this change is yet to be submitted to code base. We also intend 
 to use LevelDB to persist metadata, and plan to 
 provide a complete solution, by not just persisting  the Namespace 
 information but also the Blocks Map onto secondary storage. 
 We did implement the basic prototype which stores the block-block location 
 mapping metadata to the persistent key-value store i.e. levelDB. Prototype 
 also maintains the in-memory cache of the recently used block-block location 
 mappings metadata. 
 References:
 [1] Lin Xiao, Hortonworks, Removing Name-node’s memory limitation, 
 http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory

2013-10-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-5389:
---

Summary: A Namenode that keeps only a part of the namespace in memory  
(was: Remove INode limitations in Namenode)

 A Namenode that keeps only a part of the namespace in memory
 

 Key: HDFS-5389
 URL: https://issues.apache.org/jira/browse/HDFS-5389
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 0.23.1
Reporter: Lin Xiao
Priority: Minor

 Current HDFS Namenode stores all of its metadata in RAM. This has allowed 
 Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits 
 the total number of files that a single NN can store. While Federation allows 
 one to create multiple volumes with additional Namenodes, there is a need to 
 scale a single namespace and also to store multiple namespaces in a single 
 Namenode. When inodes are also stored on persistent storage, the system's 
 boot time can be significantly reduced because there is no need to replay 
 edit logs. It also provides the potential to support extended attributes once 
 the memory size is not the bottleneck.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory

2013-10-19 Thread Sanjay Radia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Radia updated HDFS-5389:
---

Description: 
*Background:*
Currently, the NN Keeps all its namespace in memory. This has had the benefit 
that the NN code is very simple and, more importantly, helps the NN scale to 
over 4.5K machines with 60K  to 100K concurrently tasks.  HDFS namespace can be 
scaled currently using more Ram on the NN and/or using Federation which scales 
both namespace and performance. The current federation implementation does not 
allow renames across volumes without data copying but there are proposals to 
remove that limitation.

*Motivation:*
 Hadoop lets customers store huge amounts of data at very economical prices and 
hence allows customers to store their data for several years. While most 
customers perform analytics on recent  data (last hour, day, week, months, 
quarter, year), the ability to have five year old data online for analytics is 
very attractive for many businesses. Although one can use larger RAM in a NN 
and/or use Federation, it not really necessary to store the entire namespace in 
memory since only the recent data is typically heavily accessed. 

*Proposed Solution:*
Store a portion of the NN's namespace in memory- the working set of the 
applications that are currently operating. LSM data structures are quite 
appropriate for maintaining the full namespace in memory. One choice is 
Google's LevelDB open-source implementation.

*Benefits:*
 *  Store larger namespaces without resorting to Federated namespace volumes.
 * Complementary to NN Federated namespace volumes,  indeed will allow a single 
NN to easily store multiple larger volumes.
 *  Faster cold startup - the NN does not have read its full namespace before 
responding to clients.


  was:
Current HDFS Namenode stores all of its metadata in RAM. This has allowed 
Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits 
the total number of files that a single NN can store. While Federation allows 
one to create multiple volumes with additional Namenodes, there is a need to 
scale a single namespace and also to store multiple namespaces in a single 
Namenode. When inodes are also stored on persistent storage, the system's boot 
time can be significantly reduced because there is no need to replay edit logs. 
It also provides the potential to support extended attributes once the memory 
size is not the bottleneck.



 A Namenode that keeps only a part of the namespace in memory
 

 Key: HDFS-5389
 URL: https://issues.apache.org/jira/browse/HDFS-5389
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 0.23.1
Reporter: Lin Xiao
Priority: Minor

 *Background:*
 Currently, the NN Keeps all its namespace in memory. This has had the benefit 
 that the NN code is very simple and, more importantly, helps the NN scale to 
 over 4.5K machines with 60K  to 100K concurrently tasks.  HDFS namespace can 
 be scaled currently using more Ram on the NN and/or using Federation which 
 scales both namespace and performance. The current federation implementation 
 does not allow renames across volumes without data copying but there are 
 proposals to remove that limitation.
 *Motivation:*
  Hadoop lets customers store huge amounts of data at very economical prices 
 and hence allows customers to store their data for several years. While most 
 customers perform analytics on recent  data (last hour, day, week, months, 
 quarter, year), the ability to have five year old data online for analytics 
 is very attractive for many businesses. Although one can use larger RAM in a 
 NN and/or use Federation, it not really necessary to store the entire 
 namespace in memory since only the recent data is typically heavily accessed. 
 *Proposed Solution:*
 Store a portion of the NN's namespace in memory- the working set of the 
 applications that are currently operating. LSM data structures are quite 
 appropriate for maintaining the full namespace in memory. One choice is 
 Google's LevelDB open-source implementation.
 *Benefits:*
  *  Store larger namespaces without resorting to Federated namespace volumes.
  * Complementary to NN Federated namespace volumes,  indeed will allow a 
 single NN to easily store multiple larger volumes.
  *  Faster cold startup - the NN does not have read its full namespace before 
 responding to clients.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory

2013-10-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800015#comment-13800015
 ] 

Sanjay Radia commented on HDFS-5389:


Lin built a prototype as an  intern at Hortonworks in 2013. Her  
[slides|http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation]
 presented  at Hadoop User Group (HUG) in August 2013 describes the approach 
and some early results.


 A Namenode that keeps only a part of the namespace in memory
 

 Key: HDFS-5389
 URL: https://issues.apache.org/jira/browse/HDFS-5389
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 0.23.1
Reporter: Lin Xiao
Priority: Minor

 *Background:*
 Currently, the NN Keeps all its namespace in memory. This has had the benefit 
 that the NN code is very simple and, more importantly, helps the NN scale to 
 over 4.5K machines with 60K  to 100K concurrently tasks.  HDFS namespace can 
 be scaled currently using more Ram on the NN and/or using Federation which 
 scales both namespace and performance. The current federation implementation 
 does not allow renames across volumes without data copying but there are 
 proposals to remove that limitation.
 *Motivation:*
  Hadoop lets customers store huge amounts of data at very economical prices 
 and hence allows customers to store their data for several years. While most 
 customers perform analytics on recent  data (last hour, day, week, months, 
 quarter, year), the ability to have five year old data online for analytics 
 is very attractive for many businesses. Although one can use larger RAM in a 
 NN and/or use Federation, it not really necessary to store the entire 
 namespace in memory since only the recent data is typically heavily accessed. 
 *Proposed Solution:*
 Store a portion of the NN's namespace in memory- the working set of the 
 applications that are currently operating. LSM data structures are quite 
 appropriate for maintaining the full namespace in memory. One choice is 
 Google's LevelDB open-source implementation.
 *Benefits:*
  *  Store larger namespaces without resorting to Federated namespace volumes.
  * Complementary to NN Federated namespace volumes,  indeed will allow a 
 single NN to easily store multiple larger volumes.
  *  Faster cold startup - the NN does not have read its full namespace before 
 responding to clients.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2013-10-19 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800022#comment-13800022
 ] 

Sanjay Radia commented on HDFS-5324:


Lin, a summer intern at Hortonworks, prototyped a NN that stores only the 
working set in memory (HDFS-5389). She made changes directly to NN code. 

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 3.0.0

 Attachments: AbstractNamesystem.java


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2013-10-09 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790652#comment-13790652
 ] 

Sanjay Radia commented on HDFS-5324:


I am copying my comment from  hdfs-dev:

HDFS pluggability (and relation to pluggability added as part of Federation)
 * Pluggabilty and federation are orthogonal, although we did improved the 
pluggabily of HDFS as part of federation implementation. The *block layer* was 
separated out as part of the federation work and hence makes the general 
development of new  of HDFS namespace implementations easier.  Federation's  
pluggablity was  targeted towards  someone writing a new NN and reusing the 
block storage layer via a library   and *optionally* living side-by-side with 
different implementations of the NN *within the same cluster*. Hence we added 
notion of block pools and separated out the block management layer.  
* So your proposed work is clearly not in conflict with Federation or even with 
the pluggability that Federation added, but philosophically,  your proposal is 
complementary. 

Considerations: A Public API?
The FileSystem/AbstractFileSystem APIs and the newly proposed 
AbstractFSNamesystem are targeting very different kinds of plugability into 
Hadoop. The former takes a thin application API (FileSystem and FileContext) 
and makes it easy for users to plug in different filesytems (S3, LocalFS, etc) 
as Hadoop compatible filesystems. In contrast the later (the proposed 
AbstractFSNamesystem) is a fatter interface inside the depths of HDFS 
implementation and makes parts of the impl pluggable. 

I would  not make your proposed AbstractFSNamesystem a public stable Hadoop API 
but instead direct it towards to HDFS developers who want to extend the 
implementation of HDFS more easily. Were you envisioning the 
AbstractFSNamesystem to be a stable public Hadoop API? If someone has their own 
private implementation for this new abstract class, would  the HDFS community 
have the freedom to modify the abstract class in incompatible ways? 

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 3.0.0


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-4685) Implementation of extended file acl in hdfs

2013-08-22 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747891#comment-13747891
 ] 

Sanjay Radia commented on HDFS-4685:


When we added permissions to HDFS in Hadoop 0.16 we had originally considered 
having only ACLs for directories and not doing Unix like permissions. Then, due 
to lack time, we decide to go with unix style permissions. At that time we had 
proposed the following:
* Any directory can have an ACL
* An ACL specifies the list of users and groups that can access that 
directory's subtree. As you resolve paths you take the most stringent 
permission - i.e. the most restrictive permission.
* An ACL also specified who can change the ACL. ie. changes to ACLs can be done 
by others besides the owner. 

 Implementation of extended file acl in hdfs
 ---

 Key: HDFS-4685
 URL: https://issues.apache.org/jira/browse/HDFS-4685
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, namenode, security
Affects Versions: 1.1.2
Reporter: Sachin Jose
Assignee: Chris Nauroth
Priority: Minor

 Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be 
 achieved using getfacl and setfacl utilities. Is there anybody working on 
 this feature ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS

2013-07-18 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712529#comment-13712529
 ] 

Sanjay Radia commented on HDFS-4949:


Caching partial blocks: There is no problem with a DN caching only the hot 
parts of a block and still declaring to the NN that the block is cached in ram. 
This would fit in with the proposal of abstracting ram copies as replicas. The 
use case that does not fit in is where DN1 has cached the first 100 bytes and 
and Datanode, DN2 has cached the last 100 bytes and you want the client to go 
to the right data node based on what portion of the file it is reading. If and 
when we finally get to caching portions and we want to support the use case 
mentioned, we, at that time, could considering  the block-info sent for RAM 
replicas to indicate what portion are cached -- this would mean that certain 
replicas have additional in the block map.

Given that we are not caching portions of block for this Jira and that for 
tiered storage for SSDs we want to add the device info to block location, I 
suggest that we proceed with abstracting RAM copies as replicas and later 
revisit this decision for partial block caching at a later point.

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS

2013-07-18 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712856#comment-13712856
 ] 

Sanjay Radia commented on HDFS-4949:


bq.  we have to use mmap of a file on disk.
Please look at my comments:  I have not objected to mmap and mlock.
I am fine with having Ram replicas backed by disk replica; indeed I see this as 
an important advantage over Ramfs where the data is copied. The replication 
abstractions allows for a more general view where they are not, but our 
implementation restricts the memory replicas to be backed by disk replicas.


bq. In general, tiered storage management happens over a longer period of time 
than cache management.
The term tier-storage is unfortunate (I misused it in my original comment). In 
HDFS-2832, we consciously  used  the terms heterogeneous storage and not 
tiered storage. Tiering as in moving things based on their hotness is policy. 
(BTW I envision using SSDs initially not for moving hot blocks but as storage 
for *one* of 3 replicas. I have discussed this use case with a few of the HBase 
folks). Caching is a use case that applies well to disks vs ram. Both the use 
cases apply well to the abstraction of replicas stored on different kinds of 
storage devices. 

bq. Memory is not a storage tier. It doesn't store anything; rather, it caches. 
Does it make sense to fsck memory? That is silly.
Memory and disks store data but one is way more durable. Fsck is a bad example 
- you do fsck on a file system not on the disk. Here we are taking about 
entities that store HDFS block data.  But this debate over the similarities and 
difference between ram and disk is a longer one that we should have over beer. 
I am not blind to the differences between disks and ram. Further, by using the 
same abstraction to model ram copies and disk copies does not mean that I am 
implying that I am going to always treat them as exactly the same and ignore 
the differences. 

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS

2013-07-18 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713117#comment-13713117
 ] 

Sanjay Radia commented on HDFS-4949:


To converge on this could we do a meetup?

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5005) Move SnapshotException and SnapshotAccessControlException to o.a.h.hdfs.protocol

2013-07-17 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711729#comment-13711729
 ] 

Sanjay Radia commented on HDFS-5005:


+1 


 Move SnapshotException and SnapshotAccessControlException to 
 o.a.h.hdfs.protocol
 

 Key: HDFS-5005
 URL: https://issues.apache.org/jira/browse/HDFS-5005
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5005.001.patch


 We should move the definition of these two exceptions to the protocol package 
 since they can be directly passed to clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   >