Andrew Newman wrote:
They are comparing an indexed system with one that isn't. Why is
Hadoop faster at loading than the others? Surely no one would be
surprised that it would be slower - I'm surprised at how well Hadoop
does. Who want to write a paper for next year, "grep vs reverse
index"?
2009/4/15 Guilherme Germoglio <germog...@gmail.com>:
(Hadoop is used in the benchmarks)
http://database.cs.brown.edu/sigmod09/
I think it is interesting, though it misses the point that the reason
that few datasets are >1PB today is nobody could afford to store or
process the data. With Hadoop cost is somewhat high (learn to patch the
source to fix your cluster's problems) but scales well with the #of
nodes. Commodity storage costs (my own home now has >2TB of storage)
and commodity software costs are compatible.
Some other things to look at
-power efficiency. I actually think the DBs could come out better
-ease of writing applications by skilled developers. Pig vs SQL
-performance under different workloads (take a set of log files growing
continually, mine it in near-real time. I think the last.fm use case
would be a good one)
One of the great ironies of SQL is most developers dont go near it, as
it is a detail handed by the O/R mapping engine, except when building
SQL selects for web pages. If Pig makes M/R easy, would it be used -and
if so, does that show that we developers prefer procedural thinking?
-steve