On Nov 23, 2013, at 1:40PM, Joe Landman <[email protected]> wrote:
> That is, we as a community have much to offer the growing big data
> community.
I think this is completely true, and somewhat urgent. The two communities have
a lot to teach each other.
The big data community remains incredibly naive about a lot of
performance/scalability issues - and of course they are, they’ve only been at
this a few years. Traditional HPC has a *lot* of hard-won knowledge and
experience to offer.
But conversely, where we’ve been naive is the importance of easily deployable,
scalable, easy-to-develop-for software frameworks, even if it initially comes
at substantial cost in terms of single-processor performance. If we choose not
to learn the lessons of rapid growth of tools like Hadoop, we are in trouble as
a community.
We’ve talked for years about how hardware is advancing more rapidly than
software, but not done much about it; now someone has, and it’s not us. As a
result, people are already trying to fit very HPCy sorts of problems into
Hadoopy sorts of frameworks (cf, all the BSP stuff in Pregel or Hama) because
it’s so much easier to get things working, and so much easier to find
developers to maintain. When it comes to choosing a direction for a new
project, 100x the number of developers will always win over single-processor
performance, or even scaling, because you can then direct enormous amounts of
resources to fixing performance issues in the underlying frameworks.
Jonathan
--
Jonathan Dursi, <[email protected]>
SciNet HPC Consortium, Compute Canada
http://www.SciNetHPC.ca
http://www.ComputeCanada.ca
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf