[
https://issues.apache.org/jira/browse/HDFS-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867754#action_12867754
]
Hairong Kuang commented on HDFS-599:
------------------------------------
> This of course doesn't help solve the problem of malicious clients still
> accessing the service port by hacking the values in the code.
I am not talking about a malicious client. What if there is a mis-configured
client happens to choose the service port as its client port?
> removing the ClientProtocol from the service port will effectively make it
> impossible for administrator to perform any client operations like LS, or
> even getting out of safemode
You should break the current ClientProtocol into AdminProtocol and the real
ClientProtocol.
> Improve Namenode robustness by prioritizing datanode heartbeats over client
> requests
> ------------------------------------------------------------------------------------
>
> Key: HDFS-599
> URL: https://issues.apache.org/jira/browse/HDFS-599
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Reporter: dhruba borthakur
> Assignee: Dmytro Molkov
> Attachments: HDFS-599.patch
>
>
> The namenode processes RPC requests from clients that are reading/writing to
> files as well as heartbeats/block reports from datanodes.
> Sometime, because of various reasons (Java GC runs, inconsistent performance
> of NFS filer that stores HDFS transacttion logs, etc), the namenode
> encounters transient slowness. For example, if the device that stores the
> HDFS transaction logs becomes sluggish, the Namenode's ability to process
> RPCs slows down to a certain extent. During this time, the RPCs from clients
> as well as the RPCs from datanodes suffer in similar fashion. If the
> underlying problem becomes worse, the NN's ability to process a heartbeat
> from a DN is severly impacted, thus causing the NN to declare that the DN is
> dead. Then the NN starts replicating blocks that used to reside on the
> now-declared-dead datanode. This adds extra load to the NN. Then the
> now-declared-datanode finally re-establishes contact with the NN, and sends a
> block report. The block report processing on the NN is another heavyweight
> activity, thus casing more load to the already overloaded namenode.
> My proposal is tha the NN should try its best to continue processing RPCs
> from datanodes and give lesser priority to serving client requests. The
> Datanode RPCs are integral to the consistency and performance of the Hadoop
> file system, and it is better to protect it at all costs. This will ensure
> that NN recovers from the hiccup much faster than what it does now.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.