[ https://issues.apache.org/jira/browse/HDFS-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298979#comment-16298979 ]
Joe Pallas commented on HDFS-10419: ----------------------------------- So the consensus mechanism guarantees all the container replicas are consistent, but the NN is not part of that consensus. Since the container reports don't have block-level information, the NN will make an explicit request to one of the container replicas for block info in the case of a client failure. Is that correct? > Building HDFS on top of Ozone's storage containers > -------------------------------------------------- > > Key: HDFS-10419 > URL: https://issues.apache.org/jira/browse/HDFS-10419 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Jing Zhao > Assignee: Jing Zhao > Attachments: Evolving NN using new block-container layer.pdf > > > In HDFS-7240, Ozone defines storage containers to store both the data and the > metadata. The storage container layer provides an object storage interface > and aims to manage data/metadata in a distributed manner. More details about > storage containers can be found in the design doc in HDFS-7240. > HDFS can adopt the storage containers to store and manage blocks. The general > idea is: > # Each block can be treated as an object and the block ID is the object's key. > # Blocks will still be stored in DataNodes but as objects in storage > containers. > # The block management work can be separated out of the NameNode and will be > handled by the storage container layer in a more distributed way. The > NameNode will only manage the namespace (i.e., files and directories). > # For each file, the NameNode only needs to record a list of block IDs which > are used as keys to obtain real data from storage containers. > # A new DFSClient implementation talks to both NameNode and the storage > container layer to read/write. > HDFS, especially the NameNode, can get much better scalability from this > design. Currently the NameNode's heaviest workload comes from the block > management, which includes maintaining the block-DataNode mapping, receiving > full/incremental block reports, tracking block states (under/over/miss > replicated), and joining every writing pipeline protocol to guarantee the > data consistency. These work bring high memory footprint and make NameNode > suffer from GC. HDFS-5477 already proposes to convert BlockManager as a > service. If we can build HDFS on top of the storage container layer, we not > only separate out the BlockManager from the NameNode, but also replace it > with a new distributed management scheme. > The storage container work is currently in progress in HDFS-7240, and the > work proposed here is still in an experimental/exploring stage. We can do > this experiment in a feature branch so that people with interests can be > involved. > A design doc will be uploaded later explaining more details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org