On 11/11/10 13:09, Michael Segel wrote:
The only time you'd want to look at a configurable cluster is if you're doing
HoD and you don't need to persist your data sets for long periods of time.
We run virtual private Hadoop clouds against persistent storage for
various other reasons
-lets us reuse the same machines for other work.
-lets people play with Hadoop with a small sample of their data to see
if it works for their app, without spending any money
If it does work, that's when we say "you now need a real Hadoop
cluster". One interesting trend now is that you can buy 1U servers with
2*12 TB of HDD and 6-12 cores (*). This gives you 1 PB of raw storage
and the compute to go with it, in under 50 servers, which is incredible
given how many machines were needed for that even a few years back.
There are deficiencies in Hadoop and more in HBase. Yet even with those
deficiencies, its still a good tool set and over time the deficiencies will be
addressed.
+1
-Steve