[ https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382535#comment-15382535 ]
Inigo Goiri commented on HDFS-10467: ------------------------------------ [~mingma], thank you for the comments. A few answers/clarifictions. bq. Support for mergeFs HADOOP-8298. We should be able to extend the design to support this.There might be some issues around how to provision a new sub folder (which namespace should own that) and how it works with rebalancer. This could be a good addition for future work section. In the prototype we actually started this but we haven't gone into testing with it. In addition, I think merge points go a little bit on the direction of N-Fly in HADOOP-12077. I think we should support both of them together. I'll add the reference explicitly to the document. bq. Handling of inconsistent state. Given routers cache which namenodes are active, the state could be different from the actual namenode at that moment. Thus routers might get {{StandbyException}} and need to retry on another namenode. If so, does it mean the routers should leverage ipc {{FailoverOnNetworkExceptionRetry}} or use {{DFSClient}} with hint for active namenode? In the current implementation we use the client with the hint. We first try the one marked as active in the State Store and we capture {{StandbyExceptions}} etc. This is in HDFS-10629 in {{RouterRpcServer#invokeMethod()}}. bq. Soft state vs hard state. while subcluster active namenode machine and load/space are soft state that can be reconstructed from namenodes; mount table is hard state that need to be persisted. Is there any benefit separating them out to use different state stores as they have different persistence requirement, access patterns(mount table does't change much while load/space update is frequent) and admin interface? For example, admin might want to update mount table on demand; but not load/space state. True, this is easy to implement right now. We should see if people is OK with the additional complexity of configuring two backends. I guess we can discuss in HDFS-10630. bq. Usage of subcluster load/space state. Is it correct that the only consumer of subcluster's load/space state is the rebalancer? I image initially we would run rebalancer manually. For that, the rebalancer can just pull subcluster's load/space state from namenodes on demand. Then we don't have to store subcluster load/space state in state store. Correct. Right now we are not even storing load/space data in the State Store. Actually in our Rebalancer prototypes, we are collecting the space externally. For now, we will keep the usage state out of the State Store and once we go into the Rebalancer, we can discuss what's best. bq. Admin's modification of mount table. Besides rebalancer, admin might want to update mount table during cluster initial setup as well as addition of new namespace with new mount entry. If we continue to use mounttable.xml, then admins can push the update the same way as viewFs setup. If we use ZK store, them we need to provide tools to update state store. Right now, our admin tool goes through the Routers to modify the mount table. We could also go directly to the State Store. I just created HDFS-10646 to develop this. bq. What is the performance optimization in your latest patch, based on async RPC client? Our current optimization is based on being able to use more sockets. The current client has a single thread pool per connection and we were limited by this. We haven't explored async extensively but we are not yet sure it will give us the performance we need. We need to explore this. I'll update the document accordingly. > Router-based HDFS federation > ---------------------------- > > Key: HDFS-10467 > URL: https://issues.apache.org/jira/browse/HDFS-10467 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs > Affects Versions: 2.7.2 > Reporter: Inigo Goiri > Assignee: Inigo Goiri > Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.001.patch, > HDFS-10467.PoC.patch, HDFS-Router-Federation-Prototype.patch > > > Add a Router to provide a federated view of multiple HDFS clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org