Dear All,

We are maintaining a 60-node hadoop cluster for external users, and would like to be automatically notified via email when an HDFS crash or some other infrastructure failure occurs that is not due to a user programming error. We've been encountering such "soft" errors, where hadoop does not crash, but becomes very slow and job hand for a long time and fail.

Are there existing tools that provide this capability? Or do we have to manually monitor the web services at on http://namenode and http://namenode:50030?

Thank you so much,
Oren

--
"We plan ahead, which means we don't do anything right now."
                                              -- Valentine (Tremors)

--
"We plan ahead, which means we don't do anything right now."
                                              -- Valentine (Tremors)

Reply via email to