Hi David,

On Fri, Apr 2, 2010 at 6:16 PM, David Howell <dehow...@gmail.com> wrote:

> I'm encountering a completely bizarre failure mode in my Hadoop
> cluster. A week ago, I switched from vanilla apache Hadoop 0.20.1 to
> CDH 2.
>
> Ever since then, my tasktracker/ datenode machines have been regularly
> losing their networking during long (> 1 hour) jobs. Restarting the
> network interface brings them back online immediately.
>
>
Could you clarify wha you mean by "losing their networking"? Can you ping
the node externally? If you access the node via the console (via ILOM, etc)
and run tcpdump or tshark, can you see ethernet broadcast traffic at all? Do
you see anything in dmesg on the machine in question?

Thanks
-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to