Andrew Newman wrote:
They are comparing an indexed system with one that isn't.  Why is
Hadoop faster at loading than the others?  Surely no one would be
surprised that it would be slower - I'm surprised at how well Hadoop
does.  Who want to write a paper for next year, "grep vs reverse
index"?

2009/4/15 Guilherme Germoglio <germog...@gmail.com>:
(Hadoop is used in the benchmarks)

http://database.cs.brown.edu/sigmod09/


I think it is interesting, though it misses the point that the reason that few datasets are >1PB today is nobody could afford to store or process the data. With Hadoop cost is somewhat high (learn to patch the source to fix your cluster's problems) but scales well with the #of nodes. Commodity storage costs (my own home now has >2TB of storage) and commodity software costs are compatible.

Some other things to look at

-power efficiency. I actually think the DBs could come out better
-ease of writing applications by skilled developers. Pig vs SQL
-performance under different workloads (take a set of log files growing continually, mine it in near-real time. I think the last.fm use case would be a good one)


One of the great ironies of SQL is most developers dont go near it, as it is a detail handed by the O/R mapping engine, except when building SQL selects for web pages. If Pig makes M/R easy, would it be used -and if so, does that show that we developers prefer procedural thinking?

-steve



Reply via email to