Hey Qing, You can download Dryad and see for yourself: http://connect.microsoft.com/site/sitehome.aspx?SiteID=891. There's no accompanying distributed file system, unfortunately, and I've never seen a benchmark of Dryad scaling to more than 300 nodes, so it's not clear that it's the "right" model for all workloads. There's certainly room for a richer set of physical operators in the Hadoop project, but the nice thing about Hadoop and Hive is that it's a full suite of storage, data flow execution, and a higher-level syntax that works today at scale. If you'd like to try your hand at an implementation of the Dryad model of query execution over HDFS and underneath HiveQL, that would certainly be an interesting project.
Regards, Jeff On Thu, Oct 15, 2009 at 12:31 AM, Qing Yan <qing...@gmail.com> wrote: > Hi, > > Has anyone looked into the Microsoft Dryad project? > > Their basic idea is using DAG(connect computational "vertices" with > communication "edges") to model distributed computing flows. And they have > something called DryadLINQ which seems to be the Hive equivilent. > > Since the DAG model doesn't distingish inter-job(workflow) and > intra-job(map/reduce..etc) layer, their approach of doing Query > translation,Workflow/Job Scheduling,Execution in one box may score better > optimization and fine tuning opportunties compared to the Hadoop/Hive > combo. > > Also giving majority of the hard work will be encapsulated and performed > by the translation/optimizing layer, the simplicity > beauty of Map/Reduce becomes irrelevant or even hindrance because > it doesn't permit more generic and flexible > operations like Dryad does. > > > Seems M$ got it right this time, at least on paper :-P ...thought? > > > > Qing > > > > >