Yiqun,
Is this related to HDFS-9260?
Note that HDFS-9260 was backported since CDH5.7 and above.

I'm interested to learn more. Did you observe clients failing to close file
due to insufficient number of block replicas? Did NN fail over?
Did you have gc logging enabled? Any chance to take a heap dump and analyze
what's in there?

There were quite some NN scalability and GC improvements between CDH5.5 ~
CDH5.8 time frame. We have customers at/beyond your scale in your version
but I don't think I've heard similar symptoms.

Regards

On Tue, Sep 25, 2018 at 2:04 AM Lin,Yiqun(vip.com) <yiqun01....@vipshop.com>
wrote:

> Hi hdfs developers:
>
> We meet a bad problem after rolling upgrade our hadoop version from
> 2.5.0-cdh5.3.2 to 2.6.0-cdh5.13.1. The problem is that we find NN running
> slow periodically (around a week). Concretely to say, For example, we
> startup NN on Monday, it will run fast. But time coming to Weekends, our
> cluster will become very slow.
>
> In the beginning, we thought maybe some FSN lock caused by this. And we
> did some improvements for this, e.g. configurable the remove block
> interval, print FSN lock elapsed time. After this, the problem still
> exists, :(. So we suspect this maybe not a hdfs rpc problem.
>
> Finally we find a related phenomenon: every time NN runs slow, its old gen
> reaches a high value, around 100GB. Actually, NN total metadata size is
> just around 40GB in our clsuter. So for the temporary solution, we reduce
> the heap space and trigger full gc frequently. Now it looks better than
> before but we haven’t found the root cause of this. Not so sure if this is
> a jvm tuning problem or a hdfs bug?
>
> Anyone who has met the similar problem in this version? Why the NN old gen
> space greatly increased?
>
> Some information of our env:
> JDK1.8
> 500+ Nodes, 150 million blocks, around 40GB metadata size will be used.
>
> Appreciate if anyone who can share your comments.
>
> Thanks
> Yiqun.
> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
> This communication is intended only for the addressee(s) and may contain
> information that is privileged and confidential. You are hereby notified
> that, if you are not an intended recipient listed above, or an authorized
> employee or agent of an addressee of this communication responsible for
> delivering e-mail messages to an intended recipient, any dissemination,
> distribution or reproduction of this communication (including any
> attachments hereto) is strictly prohibited. If you have received this
> communication in error, please notify us immediately by a reply e-mail
> addressed to the sender and permanently delete the original e-mail
> communication and any attachments from all storage devices without making
> or otherwise retaining a copy.
>


-- 
A very happy Clouderan

Reply via email to