Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

Steve Loughran Tue, 21 Apr 2009 07:47:02 -0700

Andrew Newman wrote:

They are comparing an indexed system with one that isn't.  Why is
Hadoop faster at loading than the others?  Surely no one would be
surprised that it would be slower - I'm surprised at how well Hadoop
does.  Who want to write a paper for next year, "grep vs reverse
index"?


2009/4/15 Guilherme Germoglio <germog...@gmail.com>:

(Hadoop is used in the benchmarks)

http://database.cs.brown.edu/sigmod09/

I think it is interesting, though it misses the point that the reasonthat few datasets are >1PB today is nobody could afford to store orprocess the data. With Hadoop cost is somewhat high (learn to patch thesource to fix your cluster's problems) but scales well with the #ofnodes. Commodity storage costs (my own home now has >2TB of storage)and commodity software costs are compatible.


Some other things to look at

-power efficiency. I actually think the DBs could come out better
-ease of writing applications by skilled developers. Pig vs SQL

-performance under different workloads (take a set of log files growingcontinually, mine it in near-real time. I think the last.fm use casewould be a good one)

One of the great ironies of SQL is most developers dont go near it, asit is a detail handed by the O/R mapping engine, except when buildingSQL selects for web pages. If Pig makes M/R easy, would it be used -andif so, does that show that we developers prefer procedural thinking?


-steve

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

Reply via email to