[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions

Ayush Saxena (Jira) Tue, 01 Oct 2019 19:30:15 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942437#comment-16942437
 ]


Ayush Saxena commented on HDFS-14284:
-------------------------------------

[~brahmareddy] I agree we should shut down the router, considering the 
{{fail-fast}} mechanism. But getting all the scenarios(I mean all) where we 
should do this doesn't seems  to be a trivial task.
Even after these many years, We didn't had all the cases where the namenode 
should terminate, if it isn't able to serve the request, (I remember fixing 
couple of months back such a missed case for NN) and Out of personal 
experience, With the fix infront of the eyes, these issues may appear simple 
but finding the root-cause is quite difficult in such cases, Atleast in cases 
of namenode,  we know where to check, since there is only one active NN, Which 
is unlikely for a RBF deployment. With 40+ Routers as Inigo mentioned, getting 
the culprit Router would be quite a time taking affair. IMO propagating back 
the routerID is worth enough.

bq.  and wn't be incompatiable if there is some automation.

Are you talking about the the scripts parsing the message, They might fail due 
to addition in routerID? If so, We can cover up that with a config, and keep 
that as false, if the Admin has a big deployment and wants to have this up he 
can enable.  If something else, which bothers, let us know we should ensure in 
anyway we don't outsmart the Compat guidelines.

> RBF: Log Router identifier when reporting exceptions
> ----------------------------------------------------
>
>                 Key: HDFS-14284
>                 URL: https://issues.apache.org/jira/browse/HDFS-14284
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: hemanthboyina
>            Priority: Major
>         Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch
>
>
> The typical setup is to use multiple Routers through 
> ConfiguredFailoverProxyProvider.
> In a regular HA Namenode setup, it is easy to know which NN was used.
> However, in RBF, any Router can be the one reporting the exception and it is 
> hard to know which was the one.
> We should have a way to identify which Router/Namenode was the one triggering 
> the exception.
> This would also apply with Observer Namenodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions

Reply via email to