[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-04-09 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487635#comment-14487635
 ] 

Edward Bortnikov commented on HDFS-7240:


Great stuff. Block- and object- level storage scales much better from the 
metadata perspective (flat space). Could play really well with the 
block-management-as-a-service proposal (HDFS-5477) that splits the namenode 
into the FS manager and the block manager services, and scales the latter 
horizontally. 

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7244) Reduce Namenode memory using Flyweight pattern

2014-10-23 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182061#comment-14182061
 ] 

Edward Bortnikov commented on HDFS-7244:


Colin,

As a matter of fact, Daryn Sharp is working on committing HDFS-6658 and the 
subtasks to the trunk. This is a fairly self-contained and stable code, of 
independent value. I'd suggest to check it in independently. Daryn - please 
chime in, you are the committer. 

 Reduce Namenode memory using Flyweight pattern
 --

 Key: HDFS-7244
 URL: https://issues.apache.org/jira/browse/HDFS-7244
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Amir Langer

 Using the flyweight pattern can dramatically reduce memory usage in the 
 Namenode. The pattern also abstracts the actual storage type and allows the 
 decision of whether it is off-heap or not and what is the serialisation 
 mechanism to be configured per deployment. 
 The idea is to move all BlockInfo data (as a first step) to this storage 
 using the Flyweight pattern. The cost to doing it will be in higher latency 
 when accessing/modifying a block. The idea is that this will be offset with a 
 reduction in memory and in the case of off-heap, a dramatic reduction in 
 memory (effectively, memory used for BlockInfo would reduce to a very small 
 constant value).
 This reduction will also have an huge impact on the latency as GC pauses will 
 be reduced considerably and may even end up with better latency results than 
 the original code.
 I wrote a stand-alone project as a proof of concept, to show the pattern, the 
 data structure we can use and what will be the performance costs of this 
 approach.
 see [Slab|https://github.com/langera/slab]
 and [Slab performance 
 results|https://github.com/langera/slab/wiki/Performance-Results].
 Slab abstracts the storage, gives several storage implementations and 
 implements the flyweight pattern for the application (Namenode in our case).
 The stages to incorporate Slab into the Namenode is outlined in the sub-tasks 
 JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7240) Object store in HDFS

2014-10-16 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173423#comment-14173423
 ] 

Edward Bortnikov commented on HDFS-7240:


Very interested to follow. How is this related to the previous jira and design 
on Block-Management-as-a-Service (HDFS-5477)? 

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey

 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-08-28 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113773#comment-14113773
 ] 

Edward Bortnikov commented on HDFS-6658:


Looks convincing Amir! Seems that everybody is +1 for this short-term 
improvement. Can we move on towards committing this code, while proceeding with 
the long-term discussion on HDFS-5477 and HDFS-6709? 

 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Amir Langer
 Attachments: BlockListOptimizationComparison.xlsx, Namenode Memory 
 Optimizations - Block replicas list.docx


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-07-14 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061213#comment-14061213
 ] 

Edward Bortnikov commented on HDFS-6658:


As opposed to block-management-as-a-service (HDFS-5477), this optimization is 
very scoped (data structure modification), and introduces minimal risk. The 
saving is about 20% of block management footprint, or about 10% of the total NN 
footprint.  

The design in HDFS-5477 details why off-heap swap space management is not an 
option in high-end settings (terabytes of metadata). If the off-heap memory is 
managed on SSD, this is still two orders of magnitude slower than DDR3. In this 
setting, block reports in large clusters cannot be sustained because they have 
no locality of reference. 

 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Amir Langer
 Attachments: Namenode Memory Optimizations - Block replicas list.docx


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5477) Block manager as a service

2014-06-19 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HDFS-5477:
---

Attachment: Block-Manager-as-a-Service.pdf

End-to-end specification and design of the Block-Manager-as-a-service feature, 
addressing the issues raised in the past discussions. 

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Block-Manager-as-a-Service.pdf, Proposal.pdf, 
 Proposal.pdf, Standalone BM.pdf, Standalone BM.pdf


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5477) Block manager as a service

2014-06-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037858#comment-14037858
 ] 

Edward Bortnikov commented on HDFS-5477:


The published document supersedes all the previous posts

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Block-Manager-as-a-Service.pdf, Proposal.pdf, 
 Proposal.pdf, Standalone BM.pdf, Standalone BM.pdf


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5477) Block manager as a service

2014-04-23 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978385#comment-13978385
 ] 

Edward Bortnikov commented on HDFS-5477:


Sorry for the setback, still working on the design ... Will publish very soon, 
probably next week. 

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Block Manager as a Service - Implementation 
 decisions.pdf, Proposal.pdf, Proposal.pdf, Remote BM.pdf, Standalone BM.pdf, 
 Standalone BM.pdf, patches.tar.gz


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5477) Block manager as a service

2014-04-05 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961207#comment-13961207
 ] 

Edward Bortnikov commented on HDFS-5477:


Working on a new design doc following our fruitful discussions at Hadoop Summit 
- will post in about a week. Stay tuned ...

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Block Manager as a Service - Implementation 
 decisions.pdf, Proposal.pdf, Proposal.pdf, Remote BM.pdf, Standalone BM.pdf, 
 Standalone BM.pdf, patches.tar.gz


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5732) Separate memory space between BM and NN

2014-03-11 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HDFS-5732:
---

Attachment: Remote BM.pdf

Updated design of the remote BM operation. 

 Separate memory space between BM and NN
 ---

 Key: HDFS-5732
 URL: https://issues.apache.org/jira/browse/HDFS-5732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Amir Langer
 Attachments: 
 0002-Separation-of-BM-from-NN-Step-2-Separate-memory-spac.patch, Remote BM.pdf


 Change created APIs to not rely on the same instance being shared in both BM 
 and NN. Use immutable objects / keep state in sync.
 BM and NN will still exist in the same VM work on a new BM service as an 
 independent process is deferred to later tasks.
 Also, a one to one relation between BM and NN is assumed. 
 This task should maintain backward compatibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5477) Block manager as a service

2014-03-11 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931095#comment-13931095
 ] 

Edward Bortnikov commented on HDFS-5477:


All great questions. Our previous documentation was pretty substandard. 
New design PDF attached at https://issues.apache.org/jira/browse/HDFS-5732 - it 
clarifies many things about the remote NN operation. 

Regarding Todd's question specifically. Yes, it's impossible to guarantee the 
100% atomicity of transactions in the face of failures. This atomicity is also 
not necessarily required as long as no data is lost. The distributed state must 
eventually converge. 

Our solution is to treat communication failures and process failures 
identically. If an RPC times out, we re-establish the NN-BM connection and 
re-synchronize the state. (There are many ways to optimize this process, e.g., 
Merkle trees). Since timeouts should be very rare in a datacenter network (in 
the absence of bugs), this treatment is not too harsh. 

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, 
 Standalone BM.pdf, patches.tar.gz


 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5453) Support fine grain locking in FSNamesystem

2013-12-23 Thread Edward Bortnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Bortnikov updated HDFS-5453:
---

Attachment: async_simulation.xlsx

Scale-up of the number of operations per second for different workloads, on a 
8-core CPU, with HW threading. 

 Support fine grain locking in FSNamesystem
 --

 Key: HDFS-5453
 URL: https://issues.apache.org/jira/browse/HDFS-5453
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: async_simulation.xlsx


 The namesystem currently uses a course grain lock to control access.  This 
 prevents concurrent writers in different branches of the tree, and prevents 
 readers from accessing branches that writers aren't using.
 Features that introduce latency to namesystem operations, such as cold 
 storage of inodes, will need fine grain locking to avoid degrading the entire 
 namesystem's throughput.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5453) Support fine grain locking in FSNamesystem

2013-12-23 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855489#comment-13855489
 ] 

Edward Bortnikov commented on HDFS-5453:


Orthogonally to the recent thread ... Attached is a spreadsheet with simulation 
results that exemplify the ballpark gain from transitioning to a completely 
lock-free architecture. It is based on a very recent Hadoop 2.0 code. 

This is more of a long-term feature, which we believe is what the namenode 
ultimately needs. 

 Support fine grain locking in FSNamesystem
 --

 Key: HDFS-5453
 URL: https://issues.apache.org/jira/browse/HDFS-5453
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: async_simulation.xlsx


 The namesystem currently uses a course grain lock to control access.  This 
 prevents concurrent writers in different branches of the tree, and prevents 
 readers from accessing branches that writers aren't using.
 Features that introduce latency to namesystem operations, such as cold 
 storage of inodes, will need fine grain locking to avoid degrading the entire 
 namesystem's throughput.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5453) Support fine grain locking in FSNamesystem

2013-12-19 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852787#comment-13852787
 ] 

Edward Bortnikov commented on HDFS-5453:


Suresh - we have done this evaluation. We switched off all traces and edit log, 
and ran a workload of 3 reads:1 write, on an 8-core CPU. With the current 
(global lock) synchronization in place, it scales about 2.5x compared to 
single-core throughput. Fine grained locking does not change the picture much - 
i.e., in CPU-bound workloads most of the time is wasted on concurrency control. 

Without any synchronization, the code scales above 7.5x (as expected). This 
underscores the potential in re-architecting Hadoop's namenode to a lock-free, 
asynchronous server with a lean custom scheduler to take care of conflicts. 
This discussion can be started in a separate JIRA. 

 Support fine grain locking in FSNamesystem
 --

 Key: HDFS-5453
 URL: https://issues.apache.org/jira/browse/HDFS-5453
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp

 The namesystem currently uses a course grain lock to control access.  This 
 prevents concurrent writers in different branches of the tree, and prevents 
 readers from accessing branches that writers aren't using.
 Features that introduce latency to namesystem operations, such as cold 
 storage of inodes, will need fine grain locking to avoid degrading the entire 
 namesystem's throughput.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5453) Support fine grain locking in FSNamesystem

2013-12-18 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851722#comment-13851722
 ] 

Edward Bortnikov commented on HDFS-5453:


Fine grained locking will not expected to have positive performance impact with 
a single-process namenode implementation. The execution of every API is so 
short that it does not justify acquisition of fine-grained locks along the path 
(we have evaluated the performance). 

Fine grained locking makes sense (and is a crucial feature) in the context of 
block management as a service 
(see https://issues.apache.org/jira/browse/HDFS-5477). When block management is 
an external process, the NN cannot afford holding a global lock during the RPC. 

 Support fine grain locking in FSNamesystem
 --

 Key: HDFS-5453
 URL: https://issues.apache.org/jira/browse/HDFS-5453
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp

 The namesystem currently uses a course grain lock to control access.  This 
 prevents concurrent writers in different branches of the tree, and prevents 
 readers from accessing branches that writers aren't using.
 Features that introduce latency to namesystem operations, such as cold 
 storage of inodes, will need fine grain locking to avoid degrading the entire 
 namesystem's throughput.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5453) Support fine grain locking in FSNamesystem

2013-11-18 Thread Edward Bortnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825798#comment-13825798
 ] 

Edward Bortnikov commented on HDFS-5453:


We work with Daryn on this feature. Fine grained locking is crucial when the 
request implementation makes a blocking call, e.g., an RPC to a remote block 
management service. In this setting, the global lock becomes awful - it's easy 
to demonstrate. With the current single-process implementation, there is no 
visible difference from the global lock, because the request code executes 
locally, and is very fast. 

 Support fine grain locking in FSNamesystem
 --

 Key: HDFS-5453
 URL: https://issues.apache.org/jira/browse/HDFS-5453
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp

 The namesystem currently uses a course grain lock to control access.  This 
 prevents concurrent writers in different branches of the tree, and prevents 
 readers from accessing branches that writers aren't using.
 Features that introduce latency to namesystem operations, such as cold 
 storage of inodes, will need fine grain locking to avoid degrading the entire 
 namesystem's throughput.



--
This message was sent by Atlassian JIRA
(v6.1#6144)