[ https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942437#comment-16942437 ]
Ayush Saxena commented on HDFS-14284: ------------------------------------- [~brahmareddy] I agree we should shut down the router, considering the {{fail-fast}} mechanism. But getting all the scenarios(I mean all) where we should do this doesn't seems to be a trivial task. Even after these many years, We didn't had all the cases where the namenode should terminate, if it isn't able to serve the request, (I remember fixing couple of months back such a missed case for NN) and Out of personal experience, With the fix infront of the eyes, these issues may appear simple but finding the root-cause is quite difficult in such cases, Atleast in cases of namenode, we know where to check, since there is only one active NN, Which is unlikely for a RBF deployment. With 40+ Routers as Inigo mentioned, getting the culprit Router would be quite a time taking affair. IMO propagating back the routerID is worth enough. bq. and wn't be incompatiable if there is some automation. Are you talking about the the scripts parsing the message, They might fail due to addition in routerID? If so, We can cover up that with a config, and keep that as false, if the Admin has a big deployment and wants to have this up he can enable. If something else, which bothers, let us know we should ensure in anyway we don't outsmart the Compat guidelines. > RBF: Log Router identifier when reporting exceptions > ---------------------------------------------------- > > Key: HDFS-14284 > URL: https://issues.apache.org/jira/browse/HDFS-14284 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Íñigo Goiri > Assignee: hemanthboyina > Priority: Major > Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch > > > The typical setup is to use multiple Routers through > ConfiguredFailoverProxyProvider. > In a regular HA Namenode setup, it is easy to know which NN was used. > However, in RBF, any Router can be the one reporting the exception and it is > hard to know which was the one. > We should have a way to identify which Router/Namenode was the one triggering > the exception. > This would also apply with Observer Namenodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org