Dear All,
We are maintaining a 60-node hadoop cluster for external users, and
would like to be automatically notified via email when an HDFS crash or
some other infrastructure failure occurs that is not due to a user
programming error. We've been encountering such "soft" errors, where
hadoop does not crash, but becomes very slow and job hand for a long
time and fail.
Are there existing tools that provide this capability? Or do we have to
manually monitor the web services at on http://namenode and
http://namenode:50030?
Thank you so much,
Oren
--
"We plan ahead, which means we don't do anything right now."
-- Valentine (Tremors)
--
"We plan ahead, which means we don't do anything right now."
-- Valentine (Tremors)