[jira] [Commented] (HDFS-10419) Building HDFS on top of new storage layer (HDSL)
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424559#comment-16424559 ] Sanjay Radia commented on HDFS-10419: - While I prefer HDSS, I would gladly let Anu who has done the bulk of heavy lifting in this project to have the final say on the name (unless his name choice was very horrible which isn't). The most passionate debates in any projects are always the name :) +1 HDDS - Hadoop Distributed Data Store. > Building HDFS on top of new storage layer (HDSL) > > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10419) Building HDFS on top of new storage layer (HDSL)
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420318#comment-16420318 ] Sanjay Radia commented on HDFS-10419: - I prefer HDSS : Hadoop Distributed Storage System sanjay > Building HDFS on top of new storage layer (HDSL) > > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10419) Building HDFS on top of new storage layer (HDSL)
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403116#comment-16403116 ] Sanjay Radia commented on HDFS-10419: - In the " [VOTE] Merging branch HDFS-7240 to trunk" thread [~andrew.wang] asked: {quote}*Sanjay says*: >- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen >acknowledge the >benefit of the >>new block layer). We have two choices here >** a) Evolve NN so that it can interact with both old and new block layer, >** b) Fork and create new NN that works only with new block layer, the old NN will continue to work with old >>block layer. >There are trade-offs but clearly the 2nd option has least impact on the old >HDFS code. *Andrew asks*: Are you proposing that we pursue the 2nd option to integrate HDSL with HDFS? {quote} Originally I would have preferred (a), but Owen made a strong case for (b) in my discussions with his last week. I believe approach (a) or (b) will depend strongly on what we want to do. For example if we do milestone-1 and get the 2x scalability and decide to stop there then clearly go with option (a) - it will require little refactoring and one can run old and new HDFS side-by-side. If you are planning to follow up milestone-1 with say the caching the working set of the namespace, then forking the NN code (ie option b) might be better, and the new NN will have to keep pulling over features and bug fixes from the old NN.. Konstantine has proposed other alternatives and we would evaluate (a) or (b) for his alternative. I am not locked into any particular path or how we would do it. > Building HDFS on top of new storage layer (HDSL) > > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7240) Scaling HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Summary: Scaling HDFS (was: Object store in HDFS) > Scaling HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, > HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, > HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, > HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its > scaling challenges and how to address those challenges. Scaling HDFS requires > scaling the namespace layer and also the block layer. _This jira provides a > new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the > block layer by grouping blocks into containers thereby reducing the > block-to-location map and also reducing the number of block reports and their > processing_ > _A scalable namespace can be put on top this scalable block layer:_ > * _HDFS-10419 describes how the existing NN can be modified to use the new > block layer._ > * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat > Key-Value namespace on top of the new block layer; while it does not provide > the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, > FileContext)._ > > Old Description > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Description: [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its scaling challenges and how to address those challenges. Scaling HDFS requires scaling the namespace layer and also the block layer. _This jira provides a new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the block layer by grouping blocks into containers thereby reducing the block-to-location map and also reducing the number of block reports and their processing_ _A scalable namespace can be put on top this scalable block layer:_ * _HDFS-10419 describes how the existing NN can be modified to use the new block layer._ * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat Key-Value namespace on top of the new block layer; while it does not provide the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ Old Description This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. was: [Scaling HDFS| x] describes areas where HDFS does well and its scaling challenges and how to address those challenges. Scaling HDFS requires scaling the namespace layer and also the block layer. _This jira provides a new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the block layer by grouping blocks into containers thereby reducing the block-to-location map and also reducing the number of block reports and their processing_ _A scalable namespace can be put on top this scalable block layer:_ * _HDFS-10419 describes how the existing NN can be modified to use the new block layer._ * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat KV namespace on top of the new block layer; while it does not provide the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ Old Description This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, > HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, > HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, > HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its > scaling challenges and how to address those challenges. Scaling HDFS requires > scaling the namespace layer and also the block layer. _This jira provides a > new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the > block layer by grouping blocks into containers thereby reducing the > block-to-location map and also reducing the number of block reports and their > processing_ > _A scalable namespace can be put on top this scalable block layer:_ > * _HDFS-10419 describes how the existing NN can be modified to use the new > block layer._ > * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat > Key-Value namespace on top of the new block layer; while it does not provide > the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, > FileContext)._ > > Old Description > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by
[jira] [Updated] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Description: [Scaling HDFS| x] describes areas where HDFS does well and its scaling challenges and how to address those challenges. Scaling HDFS requires scaling the namespace layer and also the block layer. _This jira provides a new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the block layer by grouping blocks into containers thereby reducing the block-to-location map and also reducing the number of block reports and their processing_ _A scalable namespace can be put on top this scalable block layer:_ * _HDFS-10419 describes how the existing NN can be modified to use the new block layer._ * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat KV namespace on top of the new block layer; while it does not provide the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ Old Description This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. was: [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its scaling challenges and how to address those challenges. Scaling HDFS requires scaling the namespace layer and also the block layer. _This jira provides a new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the block layer by grouping blocks into containers thereby reducing the block-to-location map and also reducing the number of block reports and their processing_ _A scalable namespace can be put on top this scalable block layer:_ * _HDFS-10419 describes how the existing NN can be modified to use the new block layer._ * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat KV namespace on top of the new block layer; while it does not provide the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ Old Description This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, > HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, > HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, > HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > [Scaling HDFS| x] describes areas where HDFS does well and its scaling > challenges and how to address those challenges. Scaling HDFS requires scaling > the namespace layer and also the block layer. _This jira provides a new block > layer, Hadoop Distributed Storage Layer (HDSL), that scales the block layer > by grouping blocks into containers thereby reducing the block-to-location map > and also reducing the number of block reports and their processing_ > _A scalable namespace can be put on top this scalable block layer:_ > * _HDFS-10419 describes how the existing NN can be modified to use the new > block layer._ > * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat KV > namespace on top of the new block layer; while it does not provide the HDFS > API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ > > > Old Description > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA
[jira] [Updated] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Description: [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its scaling challenges and how to address those challenges. Scaling HDFS requires scaling the namespace layer and also the block layer. _This jira provides a new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the block layer by grouping blocks into containers thereby reducing the block-to-location map and also reducing the number of block reports and their processing_ _A scalable namespace can be put on top this scalable block layer:_ * _HDFS-10419 describes how the existing NN can be modified to use the new block layer._ * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat KV namespace on top of the new block layer; while it does not provide the HDFS API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ Old Description This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. was: This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, > HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, > HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, > HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > [^HDFS Scalability-v2.pdf] describes areas where HDFS does well and its > scaling challenges and how to address those challenges. Scaling HDFS requires > scaling the namespace layer and also the block layer. _This jira provides a > new block layer, Hadoop Distributed Storage Layer (HDSL), that scales the > block layer by grouping blocks into containers thereby reducing the > block-to-location map and also reducing the number of block reports and their > processing_ > _A scalable namespace can be put on top this scalable block layer:_ > * _HDFS-10419 describes how the existing NN can be modified to use the new > block layer._ > * _HDFS-13074 also provides, as an_ *_interim_* _step; a scalable flat KV > namespace on top of the new block layer; while it does not provide the HDFS > API, it does support the Hadoop FS APIs (Hadoop FileSystem, FileContext)._ > > > Old Description > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Attachment: HDFS Scalability-v2.pdf > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS Scalability and Ozone.pdf, HDFS Scalability-v2.pdf, > HDFS-7240.001.patch, HDFS-7240.002.patch, HDFS-7240.003.patch, > HDFS-7240.003.patch, HDFS-7240.004.patch, HDFS-7240.005.patch, > HDFS-7240.006.patch, HadoopStorageLayerSecurity.pdf, MeetingMinutes.pdf, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336903#comment-16336903 ] Sanjay Radia commented on HDFS-10419: - {quote}As a side note, I didn't understand why you used 50MB blocks in your math. The default is 128MB and many people run HDFS with 512MB blocks. {quote} While default block size is 128MB in many clusters including the ones at Yahoo (at least in 2011 when I left) the actual average block size was 50MB because most files had one block and even the first block was not full. > Building HDFS on top of Ozone's storage containers > -- > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020
[ https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334743#comment-16334743 ] Sanjay Radia commented on HDFS-12990: - For this Jira, I can go either way. In this specific case there is a very simple work around for the 2.x customer if he reads the release notes and adds a config to use his old port number. Anyone going from 2.x to 3.0 should read the release. However I can see that other harder cases may come up over the next couple of months where there isn’t such a convenient workaround. For example we are likely to come up with new issues when we do upgrade testing and there will be similar debates including arguments on the importance of supporting rolling upgrade from 2x to 3.0. To avoid such debates I suggest we amend our guideline to something of the following lines: On each new major release, till a tag such as “stable” is assigned by the PMC, we allow changes that make the software *compatible with the previous major release, even if that change breaks compatibility within the major release.* These changes may be for where we accidentally break compatibility or even when we concisely made a change without fully understanding or appreciating the impact of the incompatible change. The PMC will determine the testing criteria for assigning the stable tag. The above guideline will reduce such debates during a GA of a Major release. Further, users will realize that they need to wait for the stable tag before going production. One might argue that a GA release without a “stable” tag is merely a glorified beta or a GA-candidate. Yes i agree. Perhaps in this case we may have rushed from beta to GA a little too early. I think the above forces the PMC to seriously determine GA testing criteria and validate whether or they have been met. > Change default NameNode RPC port back to 8020 > - > > Key: HDFS-12990 > URL: https://issues.apache.org/jira/browse/HDFS-12990 > Project: Hadoop HDFS > Issue Type: Task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Attachments: HDFS-12990.01.patch > > > In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all > default ports to ephemeral ports, which is very appreciated by admin. As part > of that change, we also modified the NN RPC port from the famous 8020 to > 9820, to be closer to other ports changed there. > With more integration going on, it appears that all the other ephemeral port > changes are fine, but the NN RPC port change is painful for downstream on > migrating to Hadoop 3. Some examples include: > # Hive table locations pointing to hdfs://nn:port/dir > # Downstream minicluster unit tests that assumed 8020 > # Oozie workflows / downstream scripts that used 8020 > This isn't a problem for HA URLs, since that does not include the port > number. But considering the downstream impact, instead of requiring all of > them change their stuff, it would be a way better experience to leave the NN > port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary > upgrade burdens. > It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to > switch the port back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302631#comment-16302631 ] Sanjay Radia commented on HDFS-10419: - Correct, NN is not part of the consensus with the new block-container layer -- that is actually goodness - the two layers are decoupled and help reduce/eliminate global locks in state management in NN . In current HDFS the NN drives the consensus on failures. Also upon client death the NN closes an open-file after establishing length; in the new world it will also have to close an open-file but, as you state, by an explicit request asking the container replicas for the block-length. > Building HDFS on top of Ozone's storage containers > -- > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12952) Change OzoneFS's semtics to allow readers to see file content while being written
Sanjay Radia created HDFS-12952: --- Summary: Change OzoneFS's semtics to allow readers to see file content while being written Key: HDFS-12952 URL: https://issues.apache.org/jira/browse/HDFS-12952 Project: Hadoop HDFS Issue Type: Bug Reporter: Sanjay Radia Assignee: Anu Engineer Currently Ozone KSM give visibility to a file only when the file is closed, which is similar to S3 FS. OzoneFs should allow partial file visibility as a file is being written to match HDFS. Note this should be fairly straightforward because the block-container layer maintains block length consistency as a block is being written even under failures due to its use of Raft. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298728#comment-16298728 ] Sanjay Radia commented on HDFS-10419: - The new block-container layer allows partial visibility of a block even before the block has finalized an closed. It maintains block length consistency without the help of the namespace layer (KSM or NN) as a block is being written even under failures due to its use of Raft. Hence the NN, when plugged into the new block-container layer, will be fine - we can get the HDFS semantics. You are right that the Ozone KSM give visibility to a file only when the file is closed, which is similar to S3 FS. I will create a jira to fix that so that the OzoneFs matches HDFS for partail file visibility as a file is being written. > Building HDFS on top of Ozone's storage containers > -- > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297454#comment-16297454 ] Sanjay Radia commented on HDFS-7240: One of issues raised is that connecting the NN to the new block-container layer will be very difficult because removing the FSN/BM lock is challenging. I have attached a doc [Evolving NN using new block container layer|https://issues.apache.org/jira/secure/attachment/12902931/Evolving%20NN%20using%20new%20block-container%20layer.pdf] to HDFS-10419 that describes 2 milestones for connecting the NN to the new block-container layer. The first one does *not* require removing the FSN/BM lock and still gives close to 2x scalability because the block map (which becomes the container map) is reduced significantly. I would still like to also point out (as stated above) and in the doc that the new block-container layer keeps a consistent state using Raft and hence eliminates the coupling between the namespace layer and block layer and that the 2nd milestone of removing the FSN/BM lock is much easier with the new block layer. If you disagree with my lock argument, then the first milestone get good scalability without removing the lock. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, > HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, > HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, > MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10419) Building HDFS on top of Ozone's storage containers
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-10419: Attachment: Evolving NN using new block-container layer.pdf I have attached a doc that describes how the existing NN can be modified to plug in the new block-container layer provided by HDFS-7240. Two key milestone are describe: First milestone is where the Container Map is kept in NN (gets us to almost 2x scalability since container map is 1/40th of original block map assuming an average actual block size of 50MB); this milestone does NOT require removing the FSN/BM lock. The 2nd milestone is where the container map and block management is completely removed which gets us to 2x scalability. After the 2nd milestone, the NN can be evolved in several directions for further scalability. > Building HDFS on top of Ozone's storage containers > -- > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Attachment: (was: Evolving NN using new block-container layer.pdf) > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, > HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, > HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, > MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Comment: was deleted (was: I have attached a doc that describes how the existing NN can be modified to plug in the new block-container layer provided by HDFS-7240. Two key milestone are describe: First milestone is where the Container Map is kept in NN (gets us to almost 2x scalability since container map is 1/40th of original block map assuming an *average actual* block size of 50MB); this milestone does NOT require removing the FSN/BM lock. The 2nd milestone is where the container map and block management is completely removed which gets us to 2x scalability. After the 2nd milestone, the NN can be evolved in several directions for further scalability.) > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, > HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, > HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, > MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Attachment: Evolving NN using new block-container layer.pdf I have attached a doc that describes how the existing NN can be modified to plug in the new block-container layer provided by HDFS-7240. Two key milestone are describe: First milestone is where the Container Map is kept in NN (gets us to almost 2x scalability since container map is 1/40th of original block map assuming an *average actual* block size of 50MB); this milestone does NOT require removing the FSN/BM lock. The 2nd milestone is where the container map and block management is completely removed which gets us to 2x scalability. After the 2nd milestone, the NN can be evolved in several directions for further scalability. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Evolving NN using new block-container layer.pdf, HDFS > Scalability and Ozone.pdf, HDFS-7240.001.patch, HDFS-7240.002.patch, > HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, > HDFS-7240.005.patch, HDFS-7240.006.patch, MeetingMinutes.pdf, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261791#comment-16261791 ] Sanjay Radia commented on HDFS-7240: Ozone Cloudera Meeting Date: Thursday, November 16th 2017 Location: online conferencing Attendees: ATM, Andrew, Anu, Aaron Fabbri, Jitendra, Sanjay, Sean Mackrory, other listeners on the phone Main discussion centered around: * Wouldn't Ozone be better off as a separate project? * Why should it be merged now? Discussion: (This incorporate Andrew’s minutes and adds to it.) * Anu: Don't want to have this separate since it confuses people about the long-term vision of Ozone. It's intended as block management for HDFS. * Andrew: In its current state, Ozone cannot be plugged into the NN as the BM layer, so it seems premature to merge. Can't benefit existing users, and they can't test it. * Response: The Ozone block layer is at a good integration point, and we want to move on with the NameNode integration as new block layer. Benefits via KV namespace/FileSystemAPI is there and completely usable for Hive and Spark apps. * Andrew: We can do the FSN/BM lock split without merging Ozone. Separate efforts. This lock split is also a major effort by itself, and is a dangerous change. It's something that should be baked in production. * Sanjay: Agree that the lock split should be done in branch. But disagree on how hard it will be. The split was hard in past but will be easier with new block layer: one of the key reasons for the coupling of Block-layer to Namespace layer is that the block length of the each replica at block close time, esp under failures, has to be consistent. This is done in the central NN today (due to lack of raft/paxos like protocol in the original block layer). The block-container layer uses raft for consistency and no longer needs a central agent like the NN. Then new block-layers built-in consistent state management simplifies the separation. * Sanjay: Ozone developers "willing to take the hit" of the slow Hadoop release cadence. Want to make this part of HDFS since it's easier for users to test and consume without installing a new cluster. * ATM: Can still share the same hardware, and run the Ozone daemons alongside. * Sanjay countered this * Sanjay: Want to keep Ozone block management inside the Datanode process to enable various synergies such as sharing the new netty based protocol engy or fast-copy between HDFS and Ozone. Not all data needs all the HDFS features like encryption, erasure coding, etc, and this data could be stored in Ozone. * Andrew: This fast-copy hasn't been implemented or discussed yet. Unclear if it'll work at all with existing HDFS block management. Won't work with encryption or erasure coding. Not clear whether it requires being in the same DN process even. * It does have to work with encryption and EC to give value. It can work with non-encrypted and non EC which are majority of blocks in most Hadoop clusters. We will provide a design of the shallow-copy. Sanjay/Anu: Ozone is also useful to test with just the key-value interface. It's a Hadoop-compatible FileSystem, so apps many apps such as Hive and Spark can work also on Ozone since they have or ensured that they work well on KV flat namespace. * Andrew: If it provides a new API and doesn't support the HDFS feature-set, doesn't this support it being its own project? * Sanjay - It provides the EXISTING Hadoop FileSystem interface now. Note customers are used to have different parts of the namespace(s) having different features: Customers have asked for Zones with different features enabled [ see summary - to avoid duplication]. * AaronF: Ozone is a lot of new code and Hadoop already has so much code.. It is better to have separate projects and not add to Hadoop/HDFS. Sanjay: Agree it is lot of code. Sometimes, we often have to add siginficant new code for a project move forward. We have tried to incrementally work around HDFS Scaling, the NN’s manageability and slow startup issues. This new code base fundamentally moves us forward in addressing the long standing issues. Besides this “lots of new code” argument can be used later to prevent the merge of the projects. Summary: There is agreement that the new block-container layer is a good way to solve the block scaling issue of HDFS. There is no consensus on merging the branch in vs fork Ozone into a new project. The main objection to merging into HDFS is that integrating the new block-container layer with the exiting NN will be a very hard project since the lock split in the NN is very challenging. Cloudera’s team perspective: (taken from Anderew’s minutes) * Ozone could be its own project and integrated later, or remain on an HDFS branch. There are benefits to Ozone being a separate project. Can release faster, iterate more quickly on feedback, and
[jira] [Commented] (HDFS-10419) Building HDFS on top of Ozone's storage containers
[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238379#comment-16238379 ] Sanjay Radia commented on HDFS-10419: - HDFS-5389 describes one approach of building a NN that scales its namespace better than the current NN. It proposes caching only working set namespace in memory; also see [HUG - Removing Namenode's Limitation|https://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation]. Independent studies have also analysed LRU caching of HDFS Metadata [Metadata Traces and Workload Models for Evaluating Big Storage Systems|https://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation] This approach works because in spite of having large amounts of data (say data for the last five years) most of the data that is accessed is recent (say last 3-9 months); hence the working set can fit in memory. > Building HDFS on top of Ozone's storage containers > -- > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-7240: --- Attachment: HDFS Scalability and Ozone.pdf I have added a document that explains a design for scaling HDFS and how Ozone paves the way towards the full solution. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, > HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, > HDFS-7240.004.patch, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9244) Support nested encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107238#comment-15107238 ] Sanjay Radia commented on HDFS-9244: The main motivation for nested EZ is root + subdirs as per Andrew's comment. Is it such a big deal for an admin to set up EZ as he creates the directories in dirs? I think nested encryption will complicate things like volumes down the road and I don't think this extra complexity is necessary. I will comment the volumes jira drive that discussion to a conclusion. > Support nested encryption zones > --- > > Key: HDFS-9244 > URL: https://issues.apache.org/jira/browse/HDFS-9244 > Project: Hadoop HDFS > Issue Type: New Feature > Components: encryption >Reporter: Xiaoyu Yao >Assignee: Zhe Zhang > Attachments: HDFS-9244.00.patch, HDFS-9244.01.patch > > > This JIRA is opened to track adding support of nested encryption zone based > on [~andrew.wang]'s [comment > |https://issues.apache.org/jira/browse/HDFS-8747?focusedCommentId=14654141=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14654141] > for certain use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8747) Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106889#comment-15106889 ] Sanjay Radia edited comment on HDFS-8747 at 1/19/16 7:23 PM: - Posted on wrong jira by mistake (moving comment to HDFS-9244) -The main motivation for nested EZ is root + subdirs as per Andrew's comment. Is it such a big deal for an admin to set up EZ as he creates the directories in dirs? I think nested encryption will complicate things like volumes down the road and I don't think this extra complexity is necessary.- was (Author: sanjay.radia): The main motivation for nested EZ is root + subdirs as per Andrew's comment. Is it such a big deal for an admin to set up EZ as he creates the directories in dirs? I think nested encryption will complicate things like volumes down the road and I don't think this extra complexity is necessary. > Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption > Zones > -- > > Key: HDFS-8747 > URL: https://issues.apache.org/jira/browse/HDFS-8747 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, > HDFS-8747-07292015.pdf > > > HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to > allow create encryption zone on top of a single HDFS directory. Files under > the root directory of the encryption zone will be encrypted/decrypted > transparently upon HDFS client write or read operations. > Generally, it does not support rename(without data copying) across encryption > zones or between encryption zone and non-encryption zone because different > security settings of encryption zones. However, there are certain use cases > where efficient rename support is desired. This JIRA is to propose better > support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft > Delete” (a.k.a. trash) with HDFS encryption zones. > “Scratch Space” is widely used in Hadoop jobs, which requires efficient > rename support. Temporary files from MR jobs are usually stored in staging > area outside encryption zone such as “/tmp” directory and then rename to > targeted directories as specified once the data is ready to be further > processed. > Below is a summary of supported/unsupported cases from latest Hadoop: > * Rename within the encryption zone is supported > * Rename the entire encryption zone by moving the root directory of the zone > is allowed. > * Rename sub-directory/file from encryption zone to non-encryption zone is > not allowed. > * Rename sub-directory/file from encryption zone A to encryption zone B is > not allowed. > * Rename from non-encryption zone to encryption zone is not allowed. > “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that > helps prevent accidental deletion of files and directories. If trash is > enabled and a file or directory is deleted using the Hadoop shell, the file > is moved to the .Trash directory of the user's home directory instead of > being deleted. Deleted files are initially moved (renamed) to the Current > sub-directory of the .Trash directory with original path being preserved. > Files and directories in the trash can be restored simply by moving them to a > location outside the .Trash directory. > Due to the limited rename support, delete sub-directory/file within > encryption zone with trash feature is not allowed. Client has to use > -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved > the error message but without a complete solution to the problem. > We propose to solve the problem by generalizing the mapping between > encryption zone and its underlying HDFS directories from 1:1 today to 1:N. > The encryption zone should allow non-overlapped directories such as scratch > space or soft delete "trash" locations to be added/removed dynamically after > creation. This way, rename for "scratch space" and "soft delete" can be > better supported without breaking the assumption that rename is only > supported "within the zone". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8747) Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106889#comment-15106889 ] Sanjay Radia commented on HDFS-8747: The main motivation for nested EZ is root + subdirs as per Andrew's comment. Is it such a big deal for an admin to set up EZ as he creates the directories in dirs? I think nested encryption will complicate things like volumes down the road and I don't think this extra complexity is necessary. > Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption > Zones > -- > > Key: HDFS-8747 > URL: https://issues.apache.org/jira/browse/HDFS-8747 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, > HDFS-8747-07292015.pdf > > > HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to > allow create encryption zone on top of a single HDFS directory. Files under > the root directory of the encryption zone will be encrypted/decrypted > transparently upon HDFS client write or read operations. > Generally, it does not support rename(without data copying) across encryption > zones or between encryption zone and non-encryption zone because different > security settings of encryption zones. However, there are certain use cases > where efficient rename support is desired. This JIRA is to propose better > support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft > Delete” (a.k.a. trash) with HDFS encryption zones. > “Scratch Space” is widely used in Hadoop jobs, which requires efficient > rename support. Temporary files from MR jobs are usually stored in staging > area outside encryption zone such as “/tmp” directory and then rename to > targeted directories as specified once the data is ready to be further > processed. > Below is a summary of supported/unsupported cases from latest Hadoop: > * Rename within the encryption zone is supported > * Rename the entire encryption zone by moving the root directory of the zone > is allowed. > * Rename sub-directory/file from encryption zone to non-encryption zone is > not allowed. > * Rename sub-directory/file from encryption zone A to encryption zone B is > not allowed. > * Rename from non-encryption zone to encryption zone is not allowed. > “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that > helps prevent accidental deletion of files and directories. If trash is > enabled and a file or directory is deleted using the Hadoop shell, the file > is moved to the .Trash directory of the user's home directory instead of > being deleted. Deleted files are initially moved (renamed) to the Current > sub-directory of the .Trash directory with original path being preserved. > Files and directories in the trash can be restored simply by moving them to a > location outside the .Trash directory. > Due to the limited rename support, delete sub-directory/file within > encryption zone with trash feature is not allowed. Client has to use > -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved > the error message but without a complete solution to the problem. > We propose to solve the problem by generalizing the mapping between > encryption zone and its underlying HDFS directories from 1:1 today to 1:N. > The encryption zone should allow non-overlapped directories such as scratch > space or soft delete "trash" locations to be added/removed dynamically after > creation. This way, rename for "scratch space" and "soft delete" can be > better supported without breaking the assumption that rename is only > supported "within the zone". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8888) Support volumes in HDFS
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702393#comment-14702393 ] Sanjay Radia commented on HDFS-: There are several motivations for introducing Volumes to HDFS. Simplify management and implementation * Volumes make the management of some HDFS features simpler: Quotas, Encryption, Snapshots can become volume properties rather than properties of individual directories. As a unit of management, Volumes also offers strong isolations in the security settings. * It can simplify the implementation of some them. For example if we don’t allow renaming across a volume boundary then Snapshots’ implementation become easier. Will customers accept this restriction? Won’t some apps like Hive have to change since they rename from temp to final destination? Recall we disallow renames across encryption zones and customers have found that acceptable. Further, we changed Hive to deal with this restriction. * Volumes can also simplify the management of datasets. For example one can associate different other policies for volumes. For example one can setup backup policies across DR zones based on volumes. Isn’t it more flexible to have features like encryption, snapshots on arbitrary directories? Having a car with independent steering for each wheel is more flexible, but steering 2 wheels together makes a car easier to control. Volumes, while restricting the granularity, will simplify management and also the implementation. *Relation to Federation* How are volumes related to Federation? Currently in federation, each NN has a single volume. This Jira will allow each NN to have multiple volumes. Volumes adds to the Federation model. One can distribute/load balance volumes across NNs. Further it allows N+K failover especially when we add partial namespace caching (HDFS-). (More on this later.) Other things to explore with Volumes (outside the scope of this Jira) * Each volume could become its own RW lock with in the NN. This would improve parallelism within NN without much additional effort. * Each volume could have its own image/journal to allow relocation of a volume to another NN (see federation). * Associate storage policies with a volume such as the volume is backed by the same storage. The semantic allows new features like co-located data. Support volumes in HDFS --- Key: HDFS- URL: https://issues.apache.org/jira/browse/HDFS- Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai There are multiple types of zones (e.g., snapshottable directories, encryption zones, directories with quotas) which are conceptually close to namespace volumes in traditional file systems. This jira proposes to introduce the concept of volume to simplify the implementation of snapshots and encryption zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8888) Support volumes in HDFS
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702393#comment-14702393 ] Sanjay Radia edited comment on HDFS- at 8/19/15 3:58 AM: - There are several motivations for introducing Volumes to HDFS. Simplify management and implementation * Volumes make the management of some HDFS features simpler: Quotas, Encryption, Snapshots can become volume properties rather than properties of individual directories. As a unit of management, Volumes also offers strong isolations in the security settings. * It can simplify the implementation of some them. For example if we don’t allow renaming across a volume boundary then Snapshots’ implementation become easier. Will customers accept this restriction? Won’t some apps like Hive have to change since they rename from temp to final destination? Recall we disallow renames across encryption zones and customers have found that acceptable. Further, we changed Hive to deal with this restriction. * Volumes can also simplify the management of datasets. For example one can associate different other policies for volumes. For example one can setup backup policies across DR zones based on volumes. Isn’t it more flexible to have features like encryption, snapshots on arbitrary directories? Having a car with independent steering for each wheel is more flexible, but steering 2 wheels together makes a car easier to control. Volumes, while restricting the granularity, will simplify management and also the implementation. *Relation to Federation* How are volumes related to Federation? Currently in federation, each NN has a single volume. This Jira will allow each NN to have multiple volumes. Volumes adds to the Federation model. One can distribute/load balance volumes across NNs. Further it allows N+K failover especially when we add partial namespace caching (HDFS-). (More on this later.) *Other things to explore with Volumes* (outside the scope of this Jira) * Each volume could become its own RW lock with in the NN. This would improve parallelism within NN without much additional effort. * Each volume could have its own image/journal to allow relocation of a volume to another NN (see federation). * Associate storage policies with a volume such as the volume is backed by the same storage. The semantic allows new features like co-located data. was (Author: sanjay.radia): There are several motivations for introducing Volumes to HDFS. Simplify management and implementation * Volumes make the management of some HDFS features simpler: Quotas, Encryption, Snapshots can become volume properties rather than properties of individual directories. As a unit of management, Volumes also offers strong isolations in the security settings. * It can simplify the implementation of some them. For example if we don’t allow renaming across a volume boundary then Snapshots’ implementation become easier. Will customers accept this restriction? Won’t some apps like Hive have to change since they rename from temp to final destination? Recall we disallow renames across encryption zones and customers have found that acceptable. Further, we changed Hive to deal with this restriction. * Volumes can also simplify the management of datasets. For example one can associate different other policies for volumes. For example one can setup backup policies across DR zones based on volumes. Isn’t it more flexible to have features like encryption, snapshots on arbitrary directories? Having a car with independent steering for each wheel is more flexible, but steering 2 wheels together makes a car easier to control. Volumes, while restricting the granularity, will simplify management and also the implementation. *Relation to Federation* How are volumes related to Federation? Currently in federation, each NN has a single volume. This Jira will allow each NN to have multiple volumes. Volumes adds to the Federation model. One can distribute/load balance volumes across NNs. Further it allows N+K failover especially when we add partial namespace caching (HDFS-). (More on this later.) Other things to explore with Volumes (outside the scope of this Jira) * Each volume could become its own RW lock with in the NN. This would improve parallelism within NN without much additional effort. * Each volume could have its own image/journal to allow relocation of a volume to another NN (see federation). * Associate storage policies with a volume such as the volume is backed by the same storage. The semantic allows new features like co-located data. Support volumes in HDFS --- Key: HDFS- URL: https://issues.apache.org/jira/browse/HDFS- Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai There
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588681#comment-14588681 ] Sanjay Radia commented on HDFS-7923: bq. This change is really helpful during startup on big clusters. In the past we have seen restarting all the DNs at once on a several hundred node cluster bring the NN to its knees. There is already a random backoff for the initial block report. You can configure the initial BR backoff time. When that jira was done there was a proposal to give each DN a different backoff time depending on the number of outstanding BRs; this enhancement was not done at that time because this backoff worked very well. For a several hundred node cluster the initial BR backoff time should be approx 60sec. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.8.0 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, HDFS-7923.006.patch, HDFS-7923.007.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588931#comment-14588931 ] Sanjay Radia commented on HDFS-7923: Starvation I didn't literally mean starvation but was more concerned about fairness and about safety. Can a DN's block report be delayed for some significant period of time or due to subtle bug even long times. Our current implementation is very resilient - DNs just sent the BRs at a specific period irrespective of the NN. Does your design have a safety net - say a DN will wait a max of 2 periods to get permission (or something like that). The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.8.0 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, HDFS-7923.006.patch, HDFS-7923.007.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587054#comment-14587054 ] Sanjay Radia commented on HDFS-7923: How will you ensure that a particular DN does not get staved? i.e. How do you guarantee that BRs will get through. HDFS depends on periodic BRs for correctness. I recall in discussions with Facebook where they changed their HDFS for incremental BRs but still kept full BRs at a lower frequency just for safety. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.8.0 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, HDFS-7923.006.patch, HDFS-7923.007.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587163#comment-14587163 ] Sanjay Radia commented on HDFS-7923: Have you considered a pull model (NN pulls) which does not risk starvation? The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.8.0 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch, HDFS-7923.006.patch, HDFS-7923.007.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8401) Memfs - a layered file system for in-memory storage in HDFS
[ https://issues.apache.org/jira/browse/HDFS-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564043#comment-14564043 ] Sanjay Radia commented on HDFS-8401: Consider the following use case: one wants to run a few jobs and cache the input and the intermediate output just for the duration of these jobs. Today the user has to pin such data by changing the dir-file attributes, and when the jobs are finished he has to reset the attributes. It is easier to say jobxxx input = memfs://.../input tmp=memfs://.../tmpdir output=. Here setting the scheme is not inconvenient since it is part of parameters to a program. Further this works with any existing application - Hive, Pig etc since the hint to cache is in the scheme of the pathname. Our existing policies and dir level setting work when things are semi-permanent (ie this dir has dimension tables and please cache them - all jobs will benefit). In addition we could add or already have programmatic APIs to indicate that a file being read or written needs to be cached. But this requires change to the application code. Once we get fully automated memory caching working we will not need our existing storage policies nor layers like memfs since the system will just take care of it all - but it will take us some time to get there. I think both approaches have their own strengths and are complementary. Note spark-tachyon uses a layered file system and the approach is viewed as a simple way to control which files get cached on a per-job basis. Further one can also cache specific Hive tables in hive meta store by giving a path name that has the memfs-scheme. Here the memfs-pathname or setting the dirs attribute are roughly equal from a ease-of-usage perspective. An additional point about memfs for non-hdfs systems: the Memfs *abstraction* allows caching S3 data in a very similar fashion. Of course one will have to build a full caching implementation of memfs for S3 because the memfs proposed in this Jira is very very thin layer over HDFS because ALL the caching mechanism is already in HDFS. So I expect several implementation of the memfs interface for HCFS file systems. Memfs - a layered file system for in-memory storage in HDFS --- Key: HDFS-8401 URL: https://issues.apache.org/jira/browse/HDFS-8401 Project: Hadoop HDFS Issue Type: Bug Reporter: Arpit Agarwal Assignee: Arpit Agarwal We propose creating a layered filesystem that can provide in-memory storage using existing features within HDFS. memfs will use lazy persist writes introduced by HDFS-6581. For reads, memfs can use the Centralized Cache Management feature introduced in HDFS-4949 to load hot data to memory. Paths in memfs and hdfs will correspond 1:1 so memfs will require no additional metadata and it can be implemented entirely as a client-side library. The advantage of a layered file system is that it requires little or no changes to existing applications. e.g. Applications can use something like {{memfs://}} instead of {{hdfs://}} for files targeted to memory storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8241) Remove unused Namenode startup option FINALIZE
[ https://issues.apache.org/jira/browse/HDFS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514487#comment-14514487 ] Sanjay Radia commented on HDFS-8241: Yes it was unfortunate that this incompatibility was missed. Q. do folks feel that a startup -finalize option to NN is a good interface in ADDITION to admin command to finalize? Clearly we needs the admin command to finalize since one does not want to restart the NN to finalize. Remove unused Namenode startup option FINALIZE - Key: HDFS-8241 URL: https://issues.apache.org/jira/browse/HDFS-8241 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: HDFS-8241.patch Command : hdfs namenode -finalize 15/04/24 22:26:23 INFO namenode.NameNode: createNameNode [-finalize] *Use of the argument 'FINALIZE' is no longer supported.* To finalize an upgrade, start the NN and then run `hdfs dfsadmin -finalizeUpgrade' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8075) Revist layout version
Sanjay Radia created HDFS-8075: -- Summary: Revist layout version Key: HDFS-8075 URL: https://issues.apache.org/jira/browse/HDFS-8075 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: Sanjay Radia Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from protobuf to something else. Another example is when HDFS moves to support partial namespace in memory - they is likely to be a fairly fundamental change in layout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8075) Revist layout version
[ https://issues.apache.org/jira/browse/HDFS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia resolved HDFS-8075. Resolution: Duplicate Closed: duplicate of HDFS-5223 Revist layout version - Key: HDFS-8075 URL: https://issues.apache.org/jira/browse/HDFS-8075 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: Sanjay Radia Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from protobuf to something else. Another example is when HDFS moves to support partial namespace in memory - they is likely to be a fairly fundamental change in layout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version
[ https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483654#comment-14483654 ] Sanjay Radia commented on HDFS-5223: For the edits one could require that in order to downgrade you must do a save-image and then delete the null edits-log. We would then limit our solution to the image. For the image we could do the following * Add a *second* layout version field (call it compatible-layout-version) that indicates which version can safely read the image without data-loss. A NN that starts up will compare this field with its current layout version and then proceed as long as the edits is null. ** The ACL example (see Jira description) will state that the previous version can safely read the image without data loss. Of course newly created ACLs would be lost. ** Truncate example is tricky: one can safely downgrade if the truncate operation was not used. We could add code to not allow such new features till finalize is done. This is somewhat analogous to what ext3 was trying to do with its superblock feature flags (see Todd's comment above); what I am proposing is slightly different since it limits such features till upgrade is finalized while ext3's approach is more general in that you can downgrade at anytime as long as you have used the feature. Alternatively, we could simply not support downgrade for such a feature and simply mark the compatible-layout-version accordingly. Allow edit log/fsimage format changes without changing layout version - Key: HDFS-5223 URL: https://issues.apache.org/jira/browse/HDFS-5223 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.1-beta Reporter: Aaron T. Myers Assignee: Colin Patrick McCabe Attachments: HDFS-5223.004.patch Currently all HDFS on-disk formats are version by the single layout version. This means that even for changes which might be backward compatible, like the addition of a new edit log op code, we must go through the full `namenode -upgrade' process which requires coordination with DNs, etc. HDFS should support a lighter weight alternative. Copied description from HDFS-8075 which is a duplicate and now closed. Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from protobuf to something else. Another example is when HDFS moves to support partial namespace in memory - they is likely to be a fairly fundamental change in layout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8075) Revist layout version
[ https://issues.apache.org/jira/browse/HDFS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483602#comment-14483602 ] Sanjay Radia commented on HDFS-8075: Here is proposal: * Add a *second* layout version field (call it compatible-layout-version) that states which version can safely read the image without data-loss. ** The ACL example will state that the previous version can safely read the image without data loss. ** Truncate example is tricky: one can safely downgrade if the truncate operation was not used. We could add code to not allow such new features till finalize is done. Or we could say don't support downgrade for such a feature and simply mark the compatible-layout-version accordingly. Revist layout version - Key: HDFS-8075 URL: https://issues.apache.org/jira/browse/HDFS-8075 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: Sanjay Radia Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from protobuf to something else. Another example is when HDFS moves to support partial namespace in memory - they is likely to be a fairly fundamental change in layout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version
[ https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483654#comment-14483654 ] Sanjay Radia edited comment on HDFS-5223 at 4/7/15 6:29 PM: For the edits one could require that in order to downgrade you must do a save-image and then delete the null edits-log. We would then limit our solution to the image. For the image we could do the following: Add a *second* layout version field (call it compatible-layout-version) that indicates which version can safely read the image without data-loss. A NN that starts up will compare this field with its current layout version and then proceed as long as the edits is null. * The ACL example (see Jira description) will state that the previous version can safely read the image without data loss. Of course newly created ACLs would be lost. * Truncate example is tricky: one can safely downgrade if the truncate operation was not used. We could add code to not allow such new features till finalize is done. This is somewhat analogous to what ext3 was trying to do with its superblock feature flags (see Todd's comment above); what I am proposing is slightly different since it limits such features till upgrade is finalized while ext3's approach is more general in that you can downgrade at anytime as long as you have used the feature. We could also do the following slight variation if finalize seems too arbitrary: once you use the new feature (e.g, truncate) simply change the complatible-layout-version to be the current one and this will safely prevent older binaries from reading it. Of course alternatively, we could simply not support downgrade for such a feature and simply mark the compatible-layout-version accordingly. was (Author: sanjay.radia): For the edits one could require that in order to downgrade you must do a save-image and then delete the null edits-log. We would then limit our solution to the image. For the image we could do the following * Add a *second* layout version field (call it compatible-layout-version) that indicates which version can safely read the image without data-loss. A NN that starts up will compare this field with its current layout version and then proceed as long as the edits is null. ** The ACL example (see Jira description) will state that the previous version can safely read the image without data loss. Of course newly created ACLs would be lost. ** Truncate example is tricky: one can safely downgrade if the truncate operation was not used. We could add code to not allow such new features till finalize is done. This is somewhat analogous to what ext3 was trying to do with its superblock feature flags (see Todd's comment above); what I am proposing is slightly different since it limits such features till upgrade is finalized while ext3's approach is more general in that you can downgrade at anytime as long as you have used the feature. Alternatively, we could simply not support downgrade for such a feature and simply mark the compatible-layout-version accordingly. Allow edit log/fsimage format changes without changing layout version - Key: HDFS-5223 URL: https://issues.apache.org/jira/browse/HDFS-5223 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.1-beta Reporter: Aaron T. Myers Assignee: Colin Patrick McCabe Attachments: HDFS-5223.004.patch Currently all HDFS on-disk formats are version by the single layout version. This means that even for changes which might be backward compatible, like the addition of a new edit log op code, we must go through the full `namenode -upgrade' process which requires coordination with DNs, etc. HDFS should support a lighter weight alternative. Copied description from HDFS-8075 which is a duplicate and now closed. Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the
[jira] [Updated] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version
[ https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-5223: --- Description: Currently all HDFS on-disk formats are version by the single layout version. This means that even for changes which might be backward compatible, like the addition of a new edit log op code, we must go through the full `namenode -upgrade' process which requires coordination with DNs, etc. HDFS should support a lighter weight alternative. Copied description from HDFS-8075 which is a duplicate and now closed. Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from protobuf to something else. Another example is when HDFS moves to support partial namespace in memory - they is likely to be a fairly fundamental change in layout. was:Currently all HDFS on-disk formats are version by the single layout version. This means that even for changes which might be backward compatible, like the addition of a new edit log op code, we must go through the full `namenode -upgrade' process which requires coordination with DNs, etc. HDFS should support a lighter weight alternative. Allow edit log/fsimage format changes without changing layout version - Key: HDFS-5223 URL: https://issues.apache.org/jira/browse/HDFS-5223 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.1-beta Reporter: Aaron T. Myers Assignee: Colin Patrick McCabe Attachments: HDFS-5223.004.patch Currently all HDFS on-disk formats are version by the single layout version. This means that even for changes which might be backward compatible, like the addition of a new edit log op code, we must go through the full `namenode -upgrade' process which requires coordination with DNs, etc. HDFS should support a lighter weight alternative. Copied description from HDFS-8075 which is a duplicate and now closed. Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from
[jira] [Commented] (HDFS-8075) Revist layout version
[ https://issues.apache.org/jira/browse/HDFS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483606#comment-14483606 ] Sanjay Radia commented on HDFS-8075: Oops. I will close this as duplicate. I will copy the description of this Jira to the HDFS-5223 since it has some good examples. Revist layout version - Key: HDFS-8075 URL: https://issues.apache.org/jira/browse/HDFS-8075 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: Sanjay Radia Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from protobuf to something else. Another example is when HDFS moves to support partial namespace in memory - they is likely to be a fairly fundamental change in layout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version
[ https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483704#comment-14483704 ] Sanjay Radia commented on HDFS-5223: The above solution was inspired by Hive's ORC. They have two complementary mechanisms to address dealing with old and new binaries. They specify the oldest version that can safely read the new data (which inspired the solution i gave above) and also new binaries can write in older format. This second mechanim is too burdensome for HDFS. Instead I would prefer to disable the new new features after which one cannot downgrade. Allow edit log/fsimage format changes without changing layout version - Key: HDFS-5223 URL: https://issues.apache.org/jira/browse/HDFS-5223 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.1-beta Reporter: Aaron T. Myers Assignee: Colin Patrick McCabe Attachments: HDFS-5223.004.patch Currently all HDFS on-disk formats are version by the single layout version. This means that even for changes which might be backward compatible, like the addition of a new edit log op code, we must go through the full `namenode -upgrade' process which requires coordination with DNs, etc. HDFS should support a lighter weight alternative. Copied description from HDFS-8075 which is a duplicate and now closed. Background * HDFS image layout was changed to use Protobufs to allow easier forward and backward compatibility. * Hdfs has a layout version which is changed on each change (even if it an optional protobuf field was added). * Hadoop supports two ways of going back during an upgrade: ** downgrade: go back to old binary version but use existing image/edits so that newly created files are not lost ** rollback: go back to checkpoint created before upgrade was started - hence newly created files are lost. Layout needs to be revisited if we want to support downgrade is some circumstances which we dont today. Here are use cases: * Some changes can support downgrade even though they was a change in layout since there is not real data loss but only loss of new functionality. E.g. when we added ACLs one could have downgraded - there is no data loss but you will lose the newly created ACLs. That is acceptable for a user since one does not expect to retain the newly added ACLs in an old version. * Some changes may lead to data-loss if the functionality was used. For example, the recent truncate will cause data loss if the functionality was actually used. Now one can tell admins NOT use such new such new features till the upgrade is finalized in which case one could potentially support downgrade. * A fairly fundamental change to layout where a downgrade is not possible but a rollback is. Say we change the layout completely from protobuf to something else. Another example is when HDFS moves to support partial namespace in memory - they is likely to be a fairly fundamental change in layout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6200) Create a separate jar for hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345657#comment-14345657 ] Sanjay Radia commented on HDFS-6200: +++1 for this proposal. Create a separate jar for hdfs-client - Key: HDFS-6200 URL: https://issues.apache.org/jira/browse/HDFS-6200 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6200.000.patch, HDFS-6200.001.patch, HDFS-6200.002.patch, HDFS-6200.003.patch, HDFS-6200.004.patch, HDFS-6200.005.patch, HDFS-6200.006.patch, HDFS-6200.007.patch Currently the hadoop-hdfs jar contain both the hdfs server and the hdfs client. As discussed in the hdfs-dev mailing list (http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201404.mbox/browser), downstream projects are forced to bring in additional dependency in order to access hdfs. The additional dependency sometimes can be difficult to manage for projects like Apache Falcon and Apache Oozie. This jira proposes to create a new project, hadoop-hdfs-cliient, which contains the client side of the hdfs code. Downstream projects can use this jar instead of the hadoop-hdfs to avoid unnecessary dependency. Note that it does not break the compatibility of downstream projects. This is because old downstream projects implicitly depend on hadoop-hdfs-client through the hadoop-hdfs jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7745) HDFS should have its own daemon command and not rely on the one in common
Sanjay Radia created HDFS-7745: -- Summary: HDFS should have its own daemon command and not rely on the one in common Key: HDFS-7745 URL: https://issues.apache.org/jira/browse/HDFS-7745 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sanjay Radia HDFS should have its own daemon command and not rely on the one in common. BTW Yarn split out its own daemon command during project split. Note the hdfs-command does have --daemon flag and hence the daemon script is merely a wrapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode
[ https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111655#comment-14111655 ] Sanjay Radia commented on HDFS-6469: My thoughts: * I do believe that Paxos based NN would give faster failover than what NN HA offers today (30sec to a few minutes but typically no more than 1 minute or two). So this is clearly a benefit of CNode though I have not heard a single customer complain about the failover time so far. * The proposed solution does not increase the write throughput. * The parallel reads advantage of CNode can be achieved in the current HA setup with some work (this is discussed above). If this is the main benefit than I rather pursue enhancing the NN standby to support reads. Further there is existing on going work to improve the locking in the NN. * I share Todd's view that ZK is not a usable reference implementation for Paxos. One really needs a paxos library that can be plugged in rather than an external server-based solution like ZK. So at this stage I am having a hard time seeing the benefits to justify the costs of adding this complexity. I do however understand the overhead that Wandisco faces in integrating their solution with HDFS each time HDFS is modified. Would a few plugin interfaces make it easier? I would be more than happy to support adding such plugins if they would help. Coordinated replication of the namespace using ConsensusNode Key: HDFS-6469 URL: https://issues.apache.org/jira/browse/HDFS-6469 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: CNodeDesign.pdf This is a proposal to introduce ConsensusNode - an evolution of the NameNode, which enables replication of the namespace on multiple nodes of an HDFS cluster by means of a Coordination Engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099243#comment-14099243 ] Sanjay Radia commented on HDFS-6134: We have made very good progress over the last few days. Thanks for taking the time for the offline technical discussions. Below is a summary of the concerns I have raised previously in this Jira. # Fix distcp and cp to *automatically* deal with EZ using /r/r internally. Initially we need to support only row 1 and row 4 in the table I attached in Hadoop-10919 # Fix Webhdfs to use KMS delegation tokens so that webhdfs can be used with transparent encryption without giving user hdfs KMS proxy permission (and as a result to admins). Rest is a key protocol for HDFS and for many Hadoop use cases, an Admin should not have access to the keys of encrypted files. # Further work on specifying what HAR should do (I have listed some use cases and proposed solutions ), and then follow it up with a fix to har. # Some work on understanding availability and scalability on KMS for medium to large clusters. Perhaps we need to explore getting the keys ahead of time when a job is submitted. Lets complete Items 1 and 2 promptly. Before we publish transparent encryption in a 2.x release for pubic consumption, let us at least complete item 1 (ie distcp and cp) and the flag to turn this feature on/of. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099311#comment-14099311 ] Sanjay Radia commented on HDFS-6134: Alejandro. Wrt to the subtle difference between webhfs vs httpfs, can an admin grab the EDEKs and raw files and then log into the httpfs machine become user httpfs and then trick the KMS to decrypt the keys because httpfs has proxy setting? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096614#comment-14096614 ] Sanjay Radia commented on HDFS-6134: I get your point about client-side code for webhdfs. I do agree that httpfs is a proxy but do you want it to have blanket access to all keys? My main concern is that this jira completely breaks webhdfs. Do you find that acceptable?? There are so many users of this protocol. BTW did you see my earlier attempt at the solution (13.06 today) - does that work? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096617#comment-14096617 ] Sanjay Radia commented on HDFS-6134: Alejandro, can you please summarize your explanation for why during file creation, NN requests the KMS to create a new EDEK rather then having the client do it. Suresh raised the same concern that I did at our meeting yesterday. Thanks Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097330#comment-14097330 ] Sanjay Radia commented on HDFS-6134: Had a chat with Owen over the wehbhdfs issue and the solution I had proposed in [comment | https://issues.apache.org/jira/browse/HDFS-6134?focusedCommentId=14096027page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14096027]. He said that restricting the client connections from user hdfs are not necessary: the DN does a doAs(user) . KMS is configured for hdfs to be proxy but it also blacklists hdfs (and other superusers). That is the DN as a proxy cannot get a key for hdfs but it can get the keys for other users. So this brings the httpfs and webhdfs solutions to be the same. Owen proposed another solution where the httpfs or DN daemons do *not* need to be trusted proxies for the KMS. The user simply passes a KMS delegation token in the REST request (we already pass HDFS delegation tokens). Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097610#comment-14097610 ] Sanjay Radia commented on HDFS-6134: Context: making things work for cp, distcp, har, etc. Is the following true: the EZ master key (EZKey) is only needed for file creation in EZ subtree. After that for reading or appending to a file, one simple needs the file's individual key. If that is true then one can copy raw encrypted files and their keys from an EZ to tape, har, tar, etc and then restore them later and things would just work. Also can one copy raw encrypted files and their keys from an EZ to another EZ which has a different EZKey and again things would work? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097687#comment-14097687 ] Sanjay Radia commented on HDFS-6134: Soem thoughts on the Har use cases and possible outcomes: 1) Har a subtree and the subtree contains an EZ. 2) Har a subtree rooted at the EZ 3) Har a subtree within an EZ Typically the subtree is replaced by the har itself, though not required. The Har is read only. The operation can be performed by an admin or by a user. Use case 1 - copy the raw files and the keys into the HAR (ie the files inside the HAR remain encrypted). When files are accessed from the Har filesystem the same machinery as for HDFS EZ should come to play to allow transparent decryption of the files. A user with no KMS permission will not be able to decrypt. Someone with read access to the HAR will be able to get to the raw files and their keys (how does this compare to the normal HDFS EZ?) Use case 2 - same as 1. Use case 3. If the har is copied elsewhere (ie it does not replace the subtree) then same as 1. If it does replace subtree the HAR will be encrypted once again (ie double encryption). Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097687#comment-14097687 ] Sanjay Radia edited comment on HDFS-6134 at 8/14/14 9:23 PM: - Some thoughts on the Har use cases and possible outcomes: 1) Har a subtree and the subtree contains an EZ. 2) Har a subtree rooted at the EZ 3) Har a subtree within an EZ Typically the subtree is replaced by the har itself, though not required. The Har is read only. The operation can be performed by an admin or by a user. Use case 1 - copy the raw files and the keys into the HAR (ie the files inside the HAR remain encrypted). When files are accessed from the Har filesystem the same machinery as for HDFS EZ should come to play to allow transparent decryption of the files. A user with no KMS permission will not be able to decrypt. Someone with read access to the HAR will be able to get to the raw files and their keys (how does this compare to the normal HDFS EZ?) Use case 2 - same as 1. Use case 3. If the har is copied elsewhere (ie it does not replace the subtree) then same as 1. If it does replace subtree the HAR will be encrypted once again (ie double encryption). was (Author: sanjay.radia): Soem thoughts on the Har use cases and possible outcomes: 1) Har a subtree and the subtree contains an EZ. 2) Har a subtree rooted at the EZ 3) Har a subtree within an EZ Typically the subtree is replaced by the har itself, though not required. The Har is read only. The operation can be performed by an admin or by a user. Use case 1 - copy the raw files and the keys into the HAR (ie the files inside the HAR remain encrypted). When files are accessed from the Har filesystem the same machinery as for HDFS EZ should come to play to allow transparent decryption of the files. A user with no KMS permission will not be able to decrypt. Someone with read access to the HAR will be able to get to the raw files and their keys (how does this compare to the normal HDFS EZ?) Use case 2 - same as 1. Use case 3. If the har is copied elsewhere (ie it does not replace the subtree) then same as 1. If it does replace subtree the HAR will be encrypted once again (ie double encryption). Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097875#comment-14097875 ] Sanjay Radia commented on HDFS-6134: bq. Had a chat with Owen over the wehbhdfs issue and the solution I had proposed in comment . He said that restricting the client connections from user hdfs are not necessary: the DN does a doAs(user) . KMS is configured for hdfs to be proxy but it also blacklists hdfs (and other superusers). That is the DN as a proxy cannot get a key for hdfs but it can get the keys for other users. So this brings the httpfs and webhdfs solutions to be the same. The above does not work: an admin can login in as hdfs and then pretend to be the NN/DN and use the proxy privilege to get DEKs from EDEKs (an admin can read EDEKs easily). (Alejandro - thanks for the explanation - i finally get the distinction between webhdfs and httpfs) Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095204#comment-14095204 ] Sanjay Radia commented on HDFS-6134: Larry I don't completely get the difference between webhdfs and httpfs but I think the cause of the difference is that user hdfs is superuser (note DN runs as hdfs and webhdfs code is executed on behalf of the end-user inside the DN after checking the permissions), Hence I think this would potentially open up access to all encrypted files that are readable. However that should NOT happen if doAs is used (correct?). I agree it would be unacceptable to say that if one enables transparent encryption then one should disable webhdfs because it would become insecure, Andrew say that Regarding webhdfs, it's not a recommended deployment but Aljeandro say Both httpfs and webhdfs will work just fine but then in the same paragraph says this could fail some security audits. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027 ] Sanjay Radia commented on HDFS-6134: Alejandro, if we treat user hdfs as a special user such that the HDFS system will not accept any client connections from hdfs then does this solve this problem?. An Admin will not be able to connect as user hdfs but can connect as user ClarkKent where ClarkKent is in the superuser group of hdfs so that the admin can do his job as superuser. It does means that we are trusting the HDFS code to be correct in not abusing its access to keys since it has proxy authority with KMS (this was not required so far.) Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096027#comment-14096027 ] Sanjay Radia edited comment on HDFS-6134 at 8/13/14 8:19 PM: - Alejandro, a potential solution: treat user hdfs as a special user such that the HDFS system will NOT accept any client connections from hdfs. An Admin will not be able to connect as user hdfs but can connect as user, say, ClarkKent where ClarkKent is in the superuser group of hdfs so that the admin can do his job as superuser. It does means that we are trusting the HDFS code to be correct in not abusing its access to keys since it has proxy authority with KMS (this was not required so far.) was (Author: sanjay.radia): Alejandro, if we treat user hdfs as a special user such that the HDFS system will not accept any client connections from hdfs then does this solve this problem?. An Admin will not be able to connect as user hdfs but can connect as user ClarkKent where ClarkKent is in the superuser group of hdfs so that the admin can do his job as superuser. It does means that we are trusting the HDFS code to be correct in not abusing its access to keys since it has proxy authority with KMS (this was not required so far.) Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096570#comment-14096570 ] Sanjay Radia commented on HDFS-6134: bq. If you set up httpfs, it runs using the 'httpfs' user, a HDFS regular user configured as proxyuser to interact with HDFS and KMS doing doAs calls Alejandro , we modified the original design in this Jira so that the NN is not a proxy for the keys but instead the client get the keys directly from the KMS because the best practice in encryption is to eliminate proxies (see Owen's comment of June 11). With your proposal for httpfs, the httpfs server is a proxy to get the keys. Perhaps we are approaching the problem wrong. Consider the following alternative: let webhdfs and httpfs simply send the encrypted raw data to the client. For the hdfs-native filesystem, the encryption and decryption happens on the client side; we should consider the same for the rest protocol. Clearly it requires more code on the rest client side. BTW the webhdfs-fileSystem (as opposed to the rest protocol that is discussed about) has a client side library that can mimic the hdfs filesystem's client side. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094485#comment-14094485 ] Sanjay Radia commented on HDFS-6134: bq. Regarding webhdfs, it's not a recommended deployment. The design document in this jira already states that webhdfs just works: * This Jira provides encryption for HDFS data at rest and allows any application to access it via the Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. * For HDFS WebHDFS, the DataNodes act as the HDFS client reading/writing files since that is where encryption/decryption will happen. For HttpFS, the HttpFS server acts as the HDFS client reading/writing files, since that is where encryption/decryption will happen. webhdfs not working is worrying because REST is used by many users who do not want to deploy hadoop binaries or want to use a non-java client. Also I do not understand why httpfs works and webhdfs breaks. Neither will be running as the end-user and hence neither will allow transparent encryption. Am I missing something? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094527#comment-14094527 ] Sanjay Radia commented on HDFS-6134: bq. Regarding HAR, could you lay out the usecase ... Alejandro summarize the problem and also the solution of modifying har in his comment of June 24th https://issues.apache.org/jira/browse/HDFS-6134?focusedCommentId=14042797page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14042797 Andrew you are missing one of the usage models of HAR: The user creating the har is not the only user accessing the har - har is a general tool used by an admin to compact files and replace the original. I can think of at least the following use cases so far : * A subtree being har'ed has subtree that is EZ - some files in the har will be encrypted and some will not. The reader should be able to transparently read each of the two kinds * A subtree being har'ed is part of subtree that is EZ - the whole har should be encrypted and transparently decrypted when its contents are read. * A user har's a non-EZ subtree and copies it into a EZ - should just work as you suggest the whole thing is encrypted and requires that the user has access to the keys to read the har. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094613#comment-14094613 ] Sanjay Radia commented on HDFS-6134: Alejandro - for both webhdfs and httpfs to work your proposal is that users hdfs and httpfs have access to any key (you mention only webhdfs in your comment but I suspect you meant both). However with this approach webhdfs and httpfs will then all access to ALL EZ files to users that have read access. Correct? This would be unacceptable. I believe the better solution is for webhdfs and httpfs to access the file by doing a doAs(endUser). Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094613#comment-14094613 ] Sanjay Radia edited comment on HDFS-6134 at 8/13/14 4:58 AM: - Alejandro - for both webhdfs and httpfs to work your proposal is that users hdfs and httpfs have access to any key (you mention only webhdfs in your comment but I suspect you meant both). However with this approach webhdfs and httpfs will give access to ALL EZ files to users that have read access. Correct? This would be unacceptable. I believe the better solution is for webhdfs and httpfs to access the file by doing a doAs(endUser). was (Author: sanjay.radia): Alejandro - for both webhdfs and httpfs to work your proposal is that users hdfs and httpfs have access to any key (you mention only webhdfs in your comment but I suspect you meant both). However with this approach webhdfs and httpfs will then all access to ALL EZ files to users that have read access. Correct? This would be unacceptable. I believe the better solution is for webhdfs and httpfs to access the file by doing a doAs(endUser). Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093538#comment-14093538 ] Sanjay Radia edited comment on HDFS-6134 at 8/12/14 12:19 AM: -- bq. Charles posted a design doc for how distcp will work with encryption at HDFS-6509. I did a quick glance over it. We also need to do the same for har. I think the same .raw should work ... was (Author: sanjay.radia): .bq. Charles posted a design doc for how distcp will work with encryption at HDFS-6509. I did a quick glance over it. We also need to do the same for har. I think the same .raw should work ... Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093538#comment-14093538 ] Sanjay Radia commented on HDFS-6134: .bq. Charles posted a design doc for how distcp will work with encryption at HDFS-6509. I did a quick glance over it. We also need to do the same for har. I think the same .raw should work ... Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093579#comment-14093579 ] Sanjay Radia commented on HDFS-6134: Wrt to webhdfs, the document says that the decryption/encryption will happen in the Datanode. * Will the DN be able to access the key necessary to do this? * The data will be transmitted in the clear - is that what we want? For the normal HDFS API the decryption/encryption happens at the client side. * There are two aspects to Webhdfs: the rest client and the webhdfs Filesystem. Have you considered both use cases? * Will distcp work via webhdfs? Customers often use webhdfs instead of hdfs for cross-cluster copies. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093583#comment-14093583 ] Sanjay Radia commented on HDFS-6134: One of the items raised at the meeting and summarized by Owen in his meeting minutes comment (june 26) is the scalability concern. How is that being addressed? Can a job client get the keys prior to job submission? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 3.0.0, 2.3.0 Reporter: Alejandro Abdelnur Assignee: Charles Lamb Attachments: HDFS-6134.001.patch, HDFS-6134.002.patch, HDFS-6134_test_plan.pdf, HDFSDataatRestEncryption.pdf, HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode
[ https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062502#comment-14062502 ] Sanjay Radia commented on HDFS-6469: bq. I meant that if you use QJM then every update on the NameNode results in writing into two journals: first into edits log and then into QJM journal. Konstantine, HDFS has supported parallel journals (ie multiple editlogs for a long time.) that are written in parallel. A customer can use just QJM (which gives at least 3 replicas) and can optionally have a local parallel editlog if they want additional redundancy. What you are proposing is dual *serial* journals. Coordinated replication of the namespace using ConsensusNode Key: HDFS-6469 URL: https://issues.apache.org/jira/browse/HDFS-6469 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: CNodeDesign.pdf This is a proposal to introduce ConsensusNode - an evolution of the NameNode, which enables replication of the namespace on multiple nodes of an HDFS cluster by means of a Coordination Engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode
[ https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062559#comment-14062559 ] Sanjay Radia commented on HDFS-6469: Todd said: bq. a fully usable solution would be available to the community at large, whereas the design you're proposing seems like it will only be usably implemented by a proprietary extension (I don't consider the ZK reference implementation likely to actually work in a usable fashion). Konstanine I had mentioned exactly the above point to you at the Hadoop summit Europe. ZK is a coordination service and for this to be practical it needs to be an inline Paxos protocol. We had also discussed 2 potential paxos libraries that could come into open source: I believe Facebook has one that they may contribute and CMU has one called E-Paxos; if either of these become available then it addresses this particular issue. I have no objections to a customer going to Wandisco for the enterprise supported version, but if the community is going to maintain such an extension then there needs to a practical, in-production-usable free solution; sending offline messages to a coordinator service for each transaction is not usable. Lets discuss the performance part in a separate comment. Let me comment on your comparisons to the topology and windows examples that the community supported in the past: * Topology - these changes allowed Hadoop to be used on containers such as VMs. ** Both KVM and VirtualBox offer free VM solutions - the customer does not need to buy ESX. ** The topology solution would will also help with a Docker container deployment which is freely available and offers even better performance than VMs. ** Hadoop is commonly used in cloud environment (e.g. AWS, or Azure, or Altiscale) which all use VMs or containers ** Further, it was recognized that while, in the past, we had considered racks to be a failure zone, that there could be other failure zones: nodes (for the case of VMs or containers on a host) and also groups of machines. * Windows - this was done for platform support which is very different than what we are talking about here; many open source solutions support multiple platforms to enable the widest adoption. BTW Hadoop supported windows via cygwin but we made it first class since the initial support via cygwin was messy. Coordinated replication of the namespace using ConsensusNode Key: HDFS-6469 URL: https://issues.apache.org/jira/browse/HDFS-6469 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: CNodeDesign.pdf This is a proposal to introduce ConsensusNode - an evolution of the NameNode, which enables replication of the namespace on multiple nodes of an HDFS cluster by means of a Coordination Engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode
[ https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049227#comment-14049227 ] Sanjay Radia commented on HDFS-6469: Wrt to double journaling bq. If I follow your logic correctly QJM being Paxos-based uses a journal by itself, so we are not increasing journaling here. When you look at the bigger picture we see more journals around. HBase uses WAL along with NN edits, which by itself persisted in ext4 a journaling file system. Konstatine, Todd's point is not that there are multiple journal in the system but that every update operation of NN, will result in an entry in two journals: HDFS's edit log and the journal used by the consensusNode paxos protocol. Your example of HBases log and NN log is not a good comparison: every write to the HBase WAL does NOT result in a HDFS editlog entry - an entry is made in the HDFS editlog ONLY when the WAL crosses a block boundary. Coordinated replication of the namespace using ConsensusNode Key: HDFS-6469 URL: https://issues.apache.org/jira/browse/HDFS-6469 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: CNodeDesign.pdf This is a proposal to introduce ConsensusNode - an evolution of the NameNode, which enables replication of the namespace on multiple nodes of an HDFS cluster by means of a Coordination Engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045415#comment-14045415 ] Sanjay Radia commented on HDFS-6134: Noticed the rename restriction for encryption zone. In the past rename was one of the main objection to volumes (ie volumes should not restrict renames). I think we should bite the bullet and introduce the notion of volumes and use encryption as the first use case for volumes (ie encryption zone become encrypted volume). Snapshot can also benefit from volume-rename restriction because supporting rename across snapshots is very hard to support. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042527#comment-14042527 ] Sanjay Radia commented on HDFS-6134: bq. Can you be a bit more specific on HAR breaking? Har copies subtree data into tar like structure. Har lets you access in the individual files transparently - all the work is done on the client side -- the NN is not involved and hence will not be able to hand out the encrypted keys or key versions. It is possible that Har can be changed work but I am merely pointing out that I don't think har will work as is with the changes proposed in this Jira. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042636#comment-14042636 ] Sanjay Radia commented on HDFS-6134: Alejandro - sorry I should have explained the HAR example better: consider a subtree which has a file called E that is encrypted and the rest normal. Now the user decides to har the subtree. The file E needs to remain encrypted inside the har; also when E is accessed from the har it needs to be transparently unencrypted. BTW this might be fixable by changing Har. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042789#comment-14042789 ] Sanjay Radia commented on HDFS-6134: bq. The NN gives the HDFS client the encrypted DEK \[unique data encryption key of the file\] and the keyVersion ID Alejandro - isn't it sufficient to hand out a keyname rather than the encrypted DEK? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040873#comment-14040873 ] Sanjay Radia commented on HDFS-6134: Aaron said: bq. distcp... I disagree - this is exactly what one wants .. So you are saying that distcp should decrypt and re-encrypt data as it copies it ... most backup tools do not this as they copy data - it is extra CPU resources and further unneeded venerability. There are customer use cases where distcp not over an encrypted channel; hence if one of the files being copied is encrypted one may not want the file to be transparently sent decrypted. Further, a sensitive file in a subtree may have been encrypted because the subtree is readable by a larger group and hence the distcp user may not have access to the keys. bq. delegation tokens - KMS ... Owen and Tucu have already discussed this quite a bit above Turns out this issue come up in discussion with Owen, and he shares the concern and suggested that I post the concern. Besides even if Alejandro and Owen are in agreement, my question is relevant and has not been raised so far above: Encryption is used to overcome limitations of authorization and authentication in the system. It is relevant to ask if the use of delegation tokens to obtain keys adds weakness. bq. meeting ... Aaron .. you are misunderstanding my point. I am not saying that the discussion on this jira have not been open. * See Alejandro's comments: Todd Lipcon and I had an offline discussion with Andrew Purtell, Yi Liu and Avik Dey and After some offline discussions with Yi, Tianyou, ATM, Todd, Andrew and Charles ... ** there have been such meetings and I have *no objections* to such private meetings because I know that the bandwidth helps. I am merely asking for one more meeting where I can quickly come up to speed on the context that Alejandro, Todd, Yi, Tianyou, Andrew, Atm, share. It will help me and others better understand the viewpoint that some of you share due to prevous high bandwidth meetings. ** There is a precedent of HDFS meetings in spite of open jira discussion - higher bandwidth to progress faster. **Perhaps I should have worded the private meetings differently ... sorry if it came across the wrong way. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041429#comment-14041429 ] Sanjay Radia commented on HDFS-6134: I believe the transparent encryption will break the HAR file system. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041449#comment-14041449 ] Sanjay Radia commented on HDFS-6134: bq. Vanilla distcp will just work with transparent encryption. Data will be decrypted on read and encrypted on write, assuming both source and target are in encrypted zones. ...The proposal on changing distcp is to enable a second use used case. Alejandro, Aaron the general practice is not to give the admins running distcp access to keys. Hence, as you suggest, we could change distcp so that it does not use transparent decryption by default; however, there may be other such backup tools and applications that customers and other vendors may have written and we would be breaking them. This may also break the HAR filesystem. Aaron, you took on a very strong position that transparent decryption/reencryption is is exactly what one wants. I am missing this - what are the use cases for distcp where one wants transparent decryption/reencryption? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037737#comment-14037737 ] Sanjay Radia commented on HDFS-6134: bq. On the distcp not accessing the keys (not decrypting/encrypting), yes, that is the idea. Alejandro not sure if I understand what you mean by the above. Are you saying that distcp and other tools/applications that copy and backup data will have be changed to do something different when the file is encrypted? In a sense, this Jira's attempt to provide transparent encryption, is breaking existing transparency. Two other questions: * Are you relying in the the kerberos credentials OR delegation tokens to obtain the keys? Isn't using the delegation token to obtain keys reducing security? * Looks like the proposal relies on file-ACLs to hand out keys - part of the motivation for using encryption is that ACLs are often correctly set. Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038099#comment-14038099 ] Sanjay Radia commented on HDFS-6134: * distcp and such tools and applications bq. Vanilla distcp will just work with transparent encryption. This is not what one wants - distcp will not necessarily have permission in decrypt. * delegation tokens - KMS will accept delegation tokens - again I don't think this is what one wants - can the keys be obtained at job submission time? * File ACLs bq. The NN gives the HDFS client the encrypted DEK and the keyVersion ID. I assume the NN will hand this out based on the file ACL. Does the above reduce the security? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038106#comment-14038106 ] Sanjay Radia commented on HDFS-6134: There are a complex set of issues to be addressed. I know that a bunch of you have had some private meetings discussing the various options and tradeoffs. Can we please have a short more public meeting next week? I can organize and host this at Hortonworks along with Google plus for those that are remote. How about next thursday at 1:30pm? Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataAtRestEncryption.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-5851: --- Attachment: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Slightly updated doc on DDM: * Clarified the separation of mechanism from DDM policy as per Colin's and my comment on being able to create memory cached file anywhere in the HDFS namespace. * Explained how DDMs fit with materialized queries (julains DIMMQ). * Updated references marked TBD * Minor improvements to the RDD/Tachyon comparison. Note this text was written prior to Ali and Li's comment and hence does not address their concern. I am rereading the Tachyon paper and will be meeting Li over the next couple of days and will update further as needed. Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034632#comment-14034632 ] Sanjay Radia commented on HDFS-5851: Colin wrote: bq. why a separate namespace under hdfs://namespace/.reserved/ddm ? We have xattrs now, so files ... I did not explain it well. It is a separation of policy and mechanism. HDFS has to support such files for ANY name. Hence we can use xattr to create files write cache. The policy of managing the memory space and the underlying swap space (e.g. hdfs://namepace/.reserved/ddm) is separate from the write-cache mechanism that HDFS needs to support in ANY part of its namespace; so I believe we are in agreement here. I will explain the policy I am proposing in a separate comment. Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984544#comment-13984544 ] Sanjay Radia commented on HDFS-5851: BTW we will host the meeting at Hortonworks for those that are local and want to attend in person: Hortonworks 3460 W. Bayshore Rd Palo Alto CA 94303 Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981608#comment-13981608 ] Sanjay Radia edited comment on HDFS-5851 at 4/28/14 10:23 PM: -- Added comparison to Tachyon in the doc. The is also an implementation difference that I don't cover (Tachyon I believe uses RamFs rather than a memory that is mapped to a HDFS file -- but need to verify that). I have reproduced the text from the updated doc here for convenience: Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is outside the address space of an application and allows sharing RDDs across applications. Both Tachyon and DDMs use memory mapped files and lazy writing to reduce the need to recompute. Tachyon, since it is an RDD implementation, records the computation in order to regenerate the data in case of loss whereas DDMs relies on the application to regenerate. Tachyon and RDDs do not have a notion of discardability, which is fundamental to DDMs where data can be discarded when it is under memory and/or backing store pressure. DDMs are closest to virtual memory/anti-caching in that they virtualize memory, with the twist that data can be discarded. was (Author: sanjay.radia): Added comparison to Tachyon in the doc. The is also an implementation difference that I don't cover (Tachyon I believe uses RamFs rather than a memory that is mapped to a HDFS file -- but need to verify that). I have reproduced the text from the updated doc here for convenience: Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is outside the address space of an application and allows sharing RDDs across applications. Both Tachyon and DDMs use memory mapped files and lazy writing to reduce the need to recompute. Tachyon, since it is an RDD implementation, records the computation in order to regenerate the data in case of loss whereas DDMs relies on the application to regenerate. Tachyon and RDDs do not have a notion of discardability, which is fundamental to DDMs where data can be discarded when it is under memory and/or backing store pressure. Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-5851: --- Attachment: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Added comparison to Tachyon in the doc. The is also an implementation difference that I don't cover (Tachyon I believe uses RamFs rather than a memory that is mapped to a HDFS file -- but need to verify that). Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981608#comment-13981608 ] Sanjay Radia edited comment on HDFS-5851 at 4/25/14 9:10 PM: - Added comparison to Tachyon in the doc. The is also an implementation difference that I don't cover (Tachyon I believe uses RamFs rather than a memory that is mapped to a HDFS file -- but need to verify that). I have reproduced the text from the updated doc here for convenience: Recently, Spark has added an RDD implementation called Tachyon [4]. Tachyon is outside the address space of an application and allows sharing RDDs across applications. Both Tachyon and DDMs use memory mapped files and lazy writing to reduce the need to recompute. Tachyon, since it is an RDD implementation, records the computation in order to regenerate the data in case of loss whereas DDMs relies on the application to regenerate. Tachyon and RDDs do not have a notion of discardability, which is fundamental to DDMs where data can be discarded when it is under memory and/or backing store pressure. was (Author: sanjay.radia): Added comparison to Tachyon in the doc. The is also an implementation difference that I don't cover (Tachyon I believe uses RamFs rather than a memory that is mapped to a HDFS file -- but need to verify that). Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf, SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5851) Support memory as a storage medium
[ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-5851: --- Attachment: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Please see attached document that identifies some use cases and a proposal for using memory for intermediate data. We introduce the notion of Discardable Distributed Memory (DDM) that exploit the property the data can be reconstructed. Further, by using HDFS files as a backing store to which DDM data is lazily written, we give the impression of much larger memory size and also give the system an additional degree of freedom to manage the scarce memory resource. The main implementation mechanism is memory-mapped files that are lazily replicated; this mechanism provides weak-persistence which may have other direct use cases beyond DDMs. Support memory as a storage medium -- Key: HDFS-5851 URL: https://issues.apache.org/jira/browse/HDFS-5851 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf Memory can be used as a storage medium for smaller/transient files for fast write throughput. More information/design will be added later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6160) TestSafeMode occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964250#comment-13964250 ] Sanjay Radia commented on HDFS-6160: +1 TestSafeMode occasionally fails --- Key: HDFS-6160 URL: https://issues.apache.org/jira/browse/HDFS-6160 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Arpit Agarwal Attachments: HDFS-6160.01.patch From https://builds.apache.org/job/PreCommit-HDFS-Build/6511//testReport/org.apache.hadoop.hdfs/TestSafeMode/testInitializeReplQueuesEarly/ : {code} java.lang.AssertionError: expected:13 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5477) Block manager as a service
[ https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920076#comment-13920076 ] Sanjay Radia commented on HDFS-5477: I read both documents; some key details are missing (perhaps in the patches, but they need to be in the document or jira). * How does the BM know the replication factor of a block? The document tends to suggest that the BM syncs with the NN on its start. Is this a sort of a reverse Block report from NN to BM where the NN tells the BM its list of blocks and their replication factor? ** In particular, one would like the BM's state to exist independently of the NN so that if one or more NNs are shut down for long periods of time (such as unmounting a namespace) then blocks and their replicas are still managed. Would this be possible under your design? If not what would it take to support it? * How would one support file affinity? - for example there are several use cases (esp for HBase) where one co-locates file blocks replicas of multiple files together. How would you support such a feature? * You briefly reference fine grained locking in the NN - does your design require that the NN holds certain fine grained locks as threads in the NN makes a remote call to the BM? Can you please detail these in context of specific operations in the document and/or jira. * How do you plan to handle file and block-under construction? There is delicate code that finalizes sizes of block under construction especially in face of NN and DN restarts? * Please describe how you would support the Balancer in your new design. * Your design does not change client APIs and hence you must be forwarding block location requires via the NN to the BM. Can you describe how one could direct clients to go directly to BM for block locations when appropriate (I believe this can done in API compatible way and also, with protocol changes, in a backward compatible way.) Block manager as a service -- Key: HDFS-5477 URL: https://issues.apache.org/jira/browse/HDFS-5477 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, Standalone BM.pdf, patches.tar.gz The block manager needs to evolve towards having the ability to run as a standalone service to improve NN vertical and horizontal scalability. The goal is reducing the memory footprint of the NN proper to support larger namespaces, and improve overall performance by decoupling the block manager from the namespace and its lock. Ideally, a distinct BM will be transparent to clients and DNs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880440#comment-13880440 ] Sanjay Radia commented on HDFS-4685: Comment on the the two alternatives for the default ACL proposals in the doc. Reproducing the text for convenience. * *Umask-Default-ACL*: The default ACL of the parent is cloned to the ACL of the child at time of child creation. For new child directories, the default ACL itself is also cloned, so that the same policy is applied to sub-directories of sub-directories. Subsequent changes to the parent’s default ACL will set a different ACL for new children, but will not alter existing children. This matches POSIX behavior. If the administrator wants to change policy on the sub-tree later, then this is performed by inserting a new more restrictive ACL entry at the appropriate sub-tree root (see UC6) and may also need to run a recursive ACL modification (analogous to chmod -R) since existing children are not effected by the new ACL. * *Inherited-Default-ACL*: A child that does not have an ACL of its own inherits its ACL from the nearest ancestor that has defined a default ACL. A child node that requires a different ACL can override the default (like the Umask-Default-ACL). Subsequent changes to the ancestor’s default ACL will cause all children that do not have an ACL to inherit the new ACL regardless of child creation time (unlike Umask-Default-ACL). This model, like the ABAC ACLs (use case UC8), encourages the user to create fewer ACLs (typically on the root of specific subtrees) while the Posix-compliant Umask-Default-ACL is expected to results in larger number of ACLs in the system. It would also make a memory efficient implementation trivial. Note that this model is a deviation from POSIX behavior. Consider the following three sub use cases here 4a) OpenUP child for wide access than the default. 4b) Restrict a child for narrower access than the default. 4c) Change the defaultAcl because you made a mistake originally. Both models support use case 4a and 4b with equal ease. However, with the Inherited-Default-ACL, it is easy to identify children that have overridden the default-ACL - the existence of an ACL means that the user intended to override the default. Also 4c is a natural fit for Inherited-Default-ACL. For the UMask-Default-ACL, every child has an ACL and hence you have to walk down the subtree and compare the ACL with the default to see if the user had intended to override it. I think the Inherited-Default-ACL is much better design but posix compliance may triumph and hence am willing to go with UMask-Default-ACL. Implementation of ACLs in HDFS -- Key: HDFS-4685 URL: https://issues.apache.org/jira/browse/HDFS-4685 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, security Affects Versions: 1.1.2 Reporter: Sachin Jose Assignee: Chris Nauroth Attachments: HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be achieved using getfacl and setfacl utilities. Is there anybody working on this feature ? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-5389: --- Issue Type: Sub-task (was: Improvement) Parent: HDFS-2362 A Namenode that keeps only a part of the namespace in memory Key: HDFS-5389 URL: https://issues.apache.org/jira/browse/HDFS-5389 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 0.23.1 Reporter: Lin Xiao Priority: Minor *Background:* Currently, the NN Keeps all its namespace in memory. This has had the benefit that the NN code is very simple and, more importantly, helps the NN scale to over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can be scaled currently using more Ram on the NN and/or using Federation which scales both namespace and performance. The current federation implementation does not allow renames across volumes without data copying but there are proposals to remove that limitation. *Motivation:* Hadoop lets customers store huge amounts of data at very economical prices and hence allows customers to store their data for several years. While most customers perform analytics on recent data (last hour, day, week, months, quarter, year), the ability to have five year old data online for analytics is very attractive for many businesses. Although one can use larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace in memory since only the recent data is typically heavily accessed. *Proposed Solution:* Store a portion of the NN's namespace in memory- the working set of the applications that are currently operating. LSM data structures are quite appropriate for maintaining the full namespace in memory. One choice is Google's LevelDB open-source implementation. *Benefits:* * Store larger namespaces without resorting to Federated namespace volumes. * Complementary to NN Federated namespace volumes, indeed will allow a single NN to easily store multiple larger volumes. * Faster cold startup - the NN does not have read its full namespace before responding to clients. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861940#comment-13861940 ] Sanjay Radia commented on HDFS-5389: Lin will shortly post a link to her prototype code on GitHub. Her prototype is based on HDFS 0.23. A Namenode that keeps only a part of the namespace in memory Key: HDFS-5389 URL: https://issues.apache.org/jira/browse/HDFS-5389 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 0.23.1 Reporter: Lin Xiao Priority: Minor *Background:* Currently, the NN Keeps all its namespace in memory. This has had the benefit that the NN code is very simple and, more importantly, helps the NN scale to over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can be scaled currently using more Ram on the NN and/or using Federation which scales both namespace and performance. The current federation implementation does not allow renames across volumes without data copying but there are proposals to remove that limitation. *Motivation:* Hadoop lets customers store huge amounts of data at very economical prices and hence allows customers to store their data for several years. While most customers perform analytics on recent data (last hour, day, week, months, quarter, year), the ability to have five year old data online for analytics is very attractive for many businesses. Although one can use larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace in memory since only the recent data is typically heavily accessed. *Proposed Solution:* Store a portion of the NN's namespace in memory- the working set of the applications that are currently operating. LSM data structures are quite appropriate for maintaining the full namespace in memory. One choice is Google's LevelDB open-source implementation. *Benefits:* * Store larger namespaces without resorting to Federated namespace volumes. * Complementary to NN Federated namespace volumes, indeed will allow a single NN to easily store multiple larger volumes. * Faster cold startup - the NN does not have read its full namespace before responding to clients. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5711) Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk.
[ https://issues.apache.org/jira/browse/HDFS-5711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860846#comment-13860846 ] Sanjay Radia commented on HDFS-5711: bq. We also intend to use LevelDB to persist metadata, and plan to provide a complete solution, by not just persisting the Namespace information but also the Blocks Map onto secondary storage. The namespace layer and block layer have been separated reasonably well as part of the federation work. Hence it makes sense to keep the persistence of the namespace and the persistence of the block map as two separate jiras. Since there is already a Jira, HDFS-5389, focusing on storing a portion (the working set ) of the namespace in memory, this jira should focus on the storing the block mapping on disk (as stated in the title). Removing memory limitation of the Namenode by persisting Block - Block location mappings to disk. - Key: HDFS-5711 URL: https://issues.apache.org/jira/browse/HDFS-5711 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Rohan Pasalkar This jira is to track changes to be made to remove HDFS name-node memory limitation to hold block - block location mappings. It is a known fact that the single Name-node architecture of HDFS has scalability limits. The HDFS federation project alleviates this problem by using horizontal scaling. This helps increase the throughput of metadata operation and also the amount of data that can be stored in a Hadoop cluster. The Name-node stores all the filesystem metadata in memory (even in the federated architecture), the Name-node design can be enhanced by persisting part of the metadata onto secondary storage and retaining the popular or recently accessed metadata information in main memory. This design can benefit a HDFS deployment which doesn't use federation but needs to store a large number of files or large number of blocks. Lin Xiao from Hortonworks attempted a similar project [1] in the Summer of 2013. They used LevelDB to persist the Namespace information (i.e file and directory inode information). A patch with this change is yet to be submitted to code base. We also intend to use LevelDB to persist metadata, and plan to provide a complete solution, by not just persisting the Namespace information but also the Blocks Map onto secondary storage. We did implement the basic prototype which stores the block-block location mapping metadata to the persistent key-value store i.e. levelDB. Prototype also maintains the in-memory cache of the recently used block-block location mappings metadata. References: [1] Lin Xiao, Hortonworks, Removing Name-node’s memory limitation, http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-5389: --- Summary: A Namenode that keeps only a part of the namespace in memory (was: Remove INode limitations in Namenode) A Namenode that keeps only a part of the namespace in memory Key: HDFS-5389 URL: https://issues.apache.org/jira/browse/HDFS-5389 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.23.1 Reporter: Lin Xiao Priority: Minor Current HDFS Namenode stores all of its metadata in RAM. This has allowed Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits the total number of files that a single NN can store. While Federation allows one to create multiple volumes with additional Namenodes, there is a need to scale a single namespace and also to store multiple namespaces in a single Namenode. When inodes are also stored on persistent storage, the system's boot time can be significantly reduced because there is no need to replay edit logs. It also provides the potential to support extended attributes once the memory size is not the bottleneck. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay Radia updated HDFS-5389: --- Description: *Background:* Currently, the NN Keeps all its namespace in memory. This has had the benefit that the NN code is very simple and, more importantly, helps the NN scale to over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can be scaled currently using more Ram on the NN and/or using Federation which scales both namespace and performance. The current federation implementation does not allow renames across volumes without data copying but there are proposals to remove that limitation. *Motivation:* Hadoop lets customers store huge amounts of data at very economical prices and hence allows customers to store their data for several years. While most customers perform analytics on recent data (last hour, day, week, months, quarter, year), the ability to have five year old data online for analytics is very attractive for many businesses. Although one can use larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace in memory since only the recent data is typically heavily accessed. *Proposed Solution:* Store a portion of the NN's namespace in memory- the working set of the applications that are currently operating. LSM data structures are quite appropriate for maintaining the full namespace in memory. One choice is Google's LevelDB open-source implementation. *Benefits:* * Store larger namespaces without resorting to Federated namespace volumes. * Complementary to NN Federated namespace volumes, indeed will allow a single NN to easily store multiple larger volumes. * Faster cold startup - the NN does not have read its full namespace before responding to clients. was: Current HDFS Namenode stores all of its metadata in RAM. This has allowed Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits the total number of files that a single NN can store. While Federation allows one to create multiple volumes with additional Namenodes, there is a need to scale a single namespace and also to store multiple namespaces in a single Namenode. When inodes are also stored on persistent storage, the system's boot time can be significantly reduced because there is no need to replay edit logs. It also provides the potential to support extended attributes once the memory size is not the bottleneck. A Namenode that keeps only a part of the namespace in memory Key: HDFS-5389 URL: https://issues.apache.org/jira/browse/HDFS-5389 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.23.1 Reporter: Lin Xiao Priority: Minor *Background:* Currently, the NN Keeps all its namespace in memory. This has had the benefit that the NN code is very simple and, more importantly, helps the NN scale to over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can be scaled currently using more Ram on the NN and/or using Federation which scales both namespace and performance. The current federation implementation does not allow renames across volumes without data copying but there are proposals to remove that limitation. *Motivation:* Hadoop lets customers store huge amounts of data at very economical prices and hence allows customers to store their data for several years. While most customers perform analytics on recent data (last hour, day, week, months, quarter, year), the ability to have five year old data online for analytics is very attractive for many businesses. Although one can use larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace in memory since only the recent data is typically heavily accessed. *Proposed Solution:* Store a portion of the NN's namespace in memory- the working set of the applications that are currently operating. LSM data structures are quite appropriate for maintaining the full namespace in memory. One choice is Google's LevelDB open-source implementation. *Benefits:* * Store larger namespaces without resorting to Federated namespace volumes. * Complementary to NN Federated namespace volumes, indeed will allow a single NN to easily store multiple larger volumes. * Faster cold startup - the NN does not have read its full namespace before responding to clients. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800015#comment-13800015 ] Sanjay Radia commented on HDFS-5389: Lin built a prototype as an intern at Hortonworks in 2013. Her [slides|http://www.slideshare.net/ydn/hadoop-meetup-hug-august-2013-removing-the-namenodes-memory-limitation] presented at Hadoop User Group (HUG) in August 2013 describes the approach and some early results. A Namenode that keeps only a part of the namespace in memory Key: HDFS-5389 URL: https://issues.apache.org/jira/browse/HDFS-5389 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.23.1 Reporter: Lin Xiao Priority: Minor *Background:* Currently, the NN Keeps all its namespace in memory. This has had the benefit that the NN code is very simple and, more importantly, helps the NN scale to over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can be scaled currently using more Ram on the NN and/or using Federation which scales both namespace and performance. The current federation implementation does not allow renames across volumes without data copying but there are proposals to remove that limitation. *Motivation:* Hadoop lets customers store huge amounts of data at very economical prices and hence allows customers to store their data for several years. While most customers perform analytics on recent data (last hour, day, week, months, quarter, year), the ability to have five year old data online for analytics is very attractive for many businesses. Although one can use larger RAM in a NN and/or use Federation, it not really necessary to store the entire namespace in memory since only the recent data is typically heavily accessed. *Proposed Solution:* Store a portion of the NN's namespace in memory- the working set of the applications that are currently operating. LSM data structures are quite appropriate for maintaining the full namespace in memory. One choice is Google's LevelDB open-source implementation. *Benefits:* * Store larger namespaces without resorting to Federated namespace volumes. * Complementary to NN Federated namespace volumes, indeed will allow a single NN to easily store multiple larger volumes. * Faster cold startup - the NN does not have read its full namespace before responding to clients. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5324) Make Namespace implementation pluggable in the namenode
[ https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800022#comment-13800022 ] Sanjay Radia commented on HDFS-5324: Lin, a summer intern at Hortonworks, prototyped a NN that stores only the working set in memory (HDFS-5389). She made changes directly to NN code. Make Namespace implementation pluggable in the namenode --- Key: HDFS-5324 URL: https://issues.apache.org/jira/browse/HDFS-5324 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.1.1-beta Environment: All Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 3.0.0 Attachments: AbstractNamesystem.java For the last couple of months, we have been working on making Namespace implementation in the namenode pluggable. We have demonstrated that it can be done without major surgery on the namenode, and does not have noticeable performance impact. We would like to contribute it back to Apache HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5324) Make Namespace implementation pluggable in the namenode
[ https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790652#comment-13790652 ] Sanjay Radia commented on HDFS-5324: I am copying my comment from hdfs-dev: HDFS pluggability (and relation to pluggability added as part of Federation) * Pluggabilty and federation are orthogonal, although we did improved the pluggabily of HDFS as part of federation implementation. The *block layer* was separated out as part of the federation work and hence makes the general development of new of HDFS namespace implementations easier. Federation's pluggablity was targeted towards someone writing a new NN and reusing the block storage layer via a library and *optionally* living side-by-side with different implementations of the NN *within the same cluster*. Hence we added notion of block pools and separated out the block management layer. * So your proposed work is clearly not in conflict with Federation or even with the pluggability that Federation added, but philosophically, your proposal is complementary. Considerations: A Public API? The FileSystem/AbstractFileSystem APIs and the newly proposed AbstractFSNamesystem are targeting very different kinds of plugability into Hadoop. The former takes a thin application API (FileSystem and FileContext) and makes it easy for users to plug in different filesytems (S3, LocalFS, etc) as Hadoop compatible filesystems. In contrast the later (the proposed AbstractFSNamesystem) is a fatter interface inside the depths of HDFS implementation and makes parts of the impl pluggable. I would not make your proposed AbstractFSNamesystem a public stable Hadoop API but instead direct it towards to HDFS developers who want to extend the implementation of HDFS more easily. Were you envisioning the AbstractFSNamesystem to be a stable public Hadoop API? If someone has their own private implementation for this new abstract class, would the HDFS community have the freedom to modify the abstract class in incompatible ways? Make Namespace implementation pluggable in the namenode --- Key: HDFS-5324 URL: https://issues.apache.org/jira/browse/HDFS-5324 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.1.1-beta Environment: All Reporter: Milind Bhandarkar Assignee: Milind Bhandarkar Fix For: 3.0.0 For the last couple of months, we have been working on making Namespace implementation in the namenode pluggable. We have demonstrated that it can be done without major surgery on the namenode, and does not have noticeable performance impact. We would like to contribute it back to Apache HDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-4685) Implementation of extended file acl in hdfs
[ https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747891#comment-13747891 ] Sanjay Radia commented on HDFS-4685: When we added permissions to HDFS in Hadoop 0.16 we had originally considered having only ACLs for directories and not doing Unix like permissions. Then, due to lack time, we decide to go with unix style permissions. At that time we had proposed the following: * Any directory can have an ACL * An ACL specifies the list of users and groups that can access that directory's subtree. As you resolve paths you take the most stringent permission - i.e. the most restrictive permission. * An ACL also specified who can change the ACL. ie. changes to ACLs can be done by others besides the owner. Implementation of extended file acl in hdfs --- Key: HDFS-4685 URL: https://issues.apache.org/jira/browse/HDFS-4685 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode, security Affects Versions: 1.1.2 Reporter: Sachin Jose Assignee: Chris Nauroth Priority: Minor Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be achieved using getfacl and setfacl utilities. Is there anybody working on this feature ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712529#comment-13712529 ] Sanjay Radia commented on HDFS-4949: Caching partial blocks: There is no problem with a DN caching only the hot parts of a block and still declaring to the NN that the block is cached in ram. This would fit in with the proposal of abstracting ram copies as replicas. The use case that does not fit in is where DN1 has cached the first 100 bytes and and Datanode, DN2 has cached the last 100 bytes and you want the client to go to the right data node based on what portion of the file it is reading. If and when we finally get to caching portions and we want to support the use case mentioned, we, at that time, could considering the block-info sent for RAM replicas to indicate what portion are cached -- this would mean that certain replicas have additional in the block map. Given that we are not caching portions of block for this Jira and that for tiered storage for SSDs we want to add the device info to block location, I suggest that we proceed with abstracting RAM copies as replicas and later revisit this decision for partial block caching at a later point. Centralized cache management in HDFS Key: HDFS-4949 URL: https://issues.apache.org/jira/browse/HDFS-4949 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: caching-design-doc-2013-07-02.pdf HDFS currently has no support for managing or exposing in-memory caches at datanodes. This makes it harder for higher level application frameworks like Hive, Pig, and Impala to effectively use cluster memory, because they cannot explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712856#comment-13712856 ] Sanjay Radia commented on HDFS-4949: bq. we have to use mmap of a file on disk. Please look at my comments: I have not objected to mmap and mlock. I am fine with having Ram replicas backed by disk replica; indeed I see this as an important advantage over Ramfs where the data is copied. The replication abstractions allows for a more general view where they are not, but our implementation restricts the memory replicas to be backed by disk replicas. bq. In general, tiered storage management happens over a longer period of time than cache management. The term tier-storage is unfortunate (I misused it in my original comment). In HDFS-2832, we consciously used the terms heterogeneous storage and not tiered storage. Tiering as in moving things based on their hotness is policy. (BTW I envision using SSDs initially not for moving hot blocks but as storage for *one* of 3 replicas. I have discussed this use case with a few of the HBase folks). Caching is a use case that applies well to disks vs ram. Both the use cases apply well to the abstraction of replicas stored on different kinds of storage devices. bq. Memory is not a storage tier. It doesn't store anything; rather, it caches. Does it make sense to fsck memory? That is silly. Memory and disks store data but one is way more durable. Fsck is a bad example - you do fsck on a file system not on the disk. Here we are taking about entities that store HDFS block data. But this debate over the similarities and difference between ram and disk is a longer one that we should have over beer. I am not blind to the differences between disks and ram. Further, by using the same abstraction to model ram copies and disk copies does not mean that I am implying that I am going to always treat them as exactly the same and ignore the differences. Centralized cache management in HDFS Key: HDFS-4949 URL: https://issues.apache.org/jira/browse/HDFS-4949 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: caching-design-doc-2013-07-02.pdf HDFS currently has no support for managing or exposing in-memory caches at datanodes. This makes it harder for higher level application frameworks like Hive, Pig, and Impala to effectively use cluster memory, because they cannot explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713117#comment-13713117 ] Sanjay Radia commented on HDFS-4949: To converge on this could we do a meetup? Centralized cache management in HDFS Key: HDFS-4949 URL: https://issues.apache.org/jira/browse/HDFS-4949 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: caching-design-doc-2013-07-02.pdf HDFS currently has no support for managing or exposing in-memory caches at datanodes. This makes it harder for higher level application frameworks like Hive, Pig, and Impala to effectively use cluster memory, because they cannot explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5005) Move SnapshotException and SnapshotAccessControlException to o.a.h.hdfs.protocol
[ https://issues.apache.org/jira/browse/HDFS-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711729#comment-13711729 ] Sanjay Radia commented on HDFS-5005: +1 Move SnapshotException and SnapshotAccessControlException to o.a.h.hdfs.protocol Key: HDFS-5005 URL: https://issues.apache.org/jira/browse/HDFS-5005 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5005.001.patch We should move the definition of these two exceptions to the protocol package since they can be directly passed to clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira