On 02/17/2015 05:16 PM, Ellis H. Wilson III wrote:
On 02/17/2015 04:56 PM, Prentice Bisbal wrote:
Why do you think 'Big Data' techniques would be applicable to this?
A large amount of data != big data.
Heh. Let's not pretend like 'big data' means anything of substance
now :D.
'Big Data' techniques are typically for finding trends in unstructured
data from multiple sources, whereas the output of scientific simulations
is usually from a single source in some sort of structured format. I
just don't see any applicability here whatsoever.
I would argue this is perhaps a bit overly specific. This might be
the typical use case, but certainly there is no reason why Hadoop and
MapReduce couldn't be used to do simple filtering of scientific
simulation output. If you were looking for places in a huge output
file where temperature is between some set of ranges and elevation
also had a specific value, I could certainly see value in applying an
easily programmable scaling framework to basically "smart grep"
through your data. Hadoop/MR could certainly help you do that.
I was intentionally being specific. Trying to correct all the lack of
specificity surrounding the term 'Big Data'. ;)
Many output formats for scientific data are well-structured as you
mentioned however, such as HDF5. This doesn't mean you have a good
file system or good parallel programming paradigm to do stupid-simple
things with this afterwards. You just have a good container format.
Hadoop could provide the other bits you need. A paper from the HDF5
group actually does a decent job of pointing out these kinds of
differences, how you might get HDF5 containers in and out of HDFS and
what impacts performance:
http://www.hdfgroup.org/HDF5/faq/hadoop.html
As they note in the paper, a recent work (I was lucky enough to talk
in the same slot as the author at SC a year back) called SciHadoop
works directly with NetCDF formatted files, so that could be another
option. Whether or not the source is available for SciHadoop is beyond
my knowledge, but a quick google would likely give you that answer.
If you are asking, "should I do weather simulation using Hadoop or
some other big data framework," my answer is a resounding NO. There
are VERY different (often far more limited) semantics and guarantees
in MR than other parallel programming paradigms, and you will almost
certainly get burned if you try to shove a climate-shaped peg through
the square hole that is MR. This is probably what Prentice was
getting at.
That's EXACTLY what I was getting at. A hammer is a good tool for
nailing pieces of wood together, but I wouldn't use it to cut down a tree.
Prentice
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf