[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haohui Mai reassigned HDFS-5389: -------------------------------- Assignee: Haohui Mai > A Namenode that keeps only a part of the namespace in memory > ------------------------------------------------------------ > > Key: HDFS-5389 > URL: https://issues.apache.org/jira/browse/HDFS-5389 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Affects Versions: 0.23.1 > Reporter: Lin Xiao > Assignee: Haohui Mai > Priority: Minor > > *Background:* > Currently, the NN Keeps all its namespace in memory. This has had the benefit > that the NN code is very simple and, more importantly, helps the NN scale to > over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can > be scaled currently using more Ram on the NN and/or using Federation which > scales both namespace and performance. The current federation implementation > does not allow renames across volumes without data copying but there are > proposals to remove that limitation. > *Motivation:* > Hadoop lets customers store huge amounts of data at very economical prices > and hence allows customers to store their data for several years. While most > customers perform analytics on recent data (last hour, day, week, months, > quarter, year), the ability to have five year old data online for analytics > is very attractive for many businesses. Although one can use larger RAM in a > NN and/or use Federation, it not really necessary to store the entire > namespace in memory since only the recent data is typically heavily accessed. > *Proposed Solution:* > Store a portion of the NN's namespace in memory- the "working set" of the > applications that are currently operating. LSM data structures are quite > appropriate for maintaining the full namespace in memory. One choice is > Google's LevelDB open-source implementation. > *Benefits:* > * Store larger namespaces without resorting to Federated namespace volumes. > * Complementary to NN Federated namespace volumes, indeed will allow a > single NN to easily store multiple larger volumes. > * Faster cold startup - the NN does not have read its full namespace before > responding to clients. -- This message was sent by Atlassian JIRA (v6.2#6252)