On Fri, Sep 25, 2009 at 9:25 AM, Brian Bockelman <bbock...@cse.unl.edu> wrote: > Hey Paul, > > Here's another visualization one can do with HDFS: > > http://www.youtube.com/watch?v=qoBoEzOkeDQ > > Each time data is moved from one host to another, it is plotted as a drop of > water from one square representing the host to one square representing the > destination. The color of the node's square depends on the number of > transfers per second. Data transferred out of the cluster is represented by > drops going in/out of the ceiling. > > Hard to describe, easy to understand when you see it. Absolutely > mesmerizing for tour groups when you put it on a big-screen. > > Brian > > On Sep 25, 2009, at 2:04 AM, Paul Smith wrote: > >> Hi, >> >> I'm still relatively new to Hadoop here, so bear with me. We have a few >> ex-SGI staff with us, and one of the tools we now use at Aconex is >> Performance Co-Pilot (PCP), which is an open-source Performance Monitoring >> suite out of Silicon Graphics (see [1]). SGI are a bit fond of large-scale >> problems and this toolset was built to support their own monster computers >> (see [2] for one of their clients, yep, that's one large single computer), >> and PCP was used to monitor and tune that, so I'm pretty confident it has >> the credentials to help with Hadoop. >> >> Aconex has built a Java bridge to PCP and has open-sourced that as Parfait >> (see [3]). We rely on this for real-time and post-problem retrospective >> analysis. We would be dead in the water without it. By being able to >> combine hardware and software metrics across multiple machines into a single >> warehouse of data we can correlate many interesting things and solve >> problems very quickly. >> >> Now I want to unleash this on Hadoop. I have written a MetricContext >> extension that uses the bridge, and I can export counters and values to PCP >> for the namenode, datanode, jobtracker and tasktracker. We are building >> some small tool extensions to allow 3D visualization. First fledgling view >> of what it looks like is here: >> >> http://people.apache.org/~psmith/clustervis.png >> >> Yes, a pretty trivial cluster at the moment, but the toolset allows pretty >> simple configurations to create the cluster by passing it the masters/slaves >> file. Once PCP tools connects to each node through my implementation of PCP >> Metric Context it can find out whether it's a namenode, or a jobtracker etc >> and display it differently. We hope to improve on the tools to utilise the >> DNSToSwitchMapping style to then visualize all the nodes within the cluster >> as they would appear in the rack. PCP already has support for Cisco >> switches so we can also integrate those into the picture and display >> inter-rack networking volumes. The real payoff here is the retrospective >> analysis, all this PCP data is collected into Archives so this view can be >> replayed at any time, and at any pace you want. Very interesting problems >> are found when you have that sort of tool. >> >> I guess my question is whether anyone else thinks this is going to be of >> value to the wider Hadoop community? Obviously we do, but we're not exactly >> stretching Hadoop just yet, nor do we fully understand some of the tricky >> performance problems large Hadoop cluster admins face. I think we'd love to >> think we could add this to the hadoop-contrib though hoping others might >> find it useful. >> >> So if anyone is interested in asking questions or suggesting crucial >> feature sets we'd appreciate it. >> >> cheers (and thanks for getting this far in the email.. :) ) >> >> Paul Smith >> psmith at aconex.com >> psmith at apache.org >> >> [1] Performance Co-Pilot (PCP) >> http://oss.sgi.com/projects/pcp/index.html >> >> [2] NASAs 'Columbia' computer >> http://www.nas.nasa.gov/News/Images/images.html >> >> [3] Parfait >> http://code.google.com/p/parfait/ > >
Open up a Jira. Lets get hadoop viz on the name node web interface for real time :)