Re: Hive vs. DryadLINQ

Jeff Hammerbacher Thu, 15 Oct 2009 10:45:01 -0700

Hey Qing,

You can download Dryad and see for yourself:
http://connect.microsoft.com/site/sitehome.aspx?SiteID=891. There's no
accompanying distributed file system, unfortunately, and I've never seen a
benchmark of Dryad scaling to more than 300 nodes, so it's not clear that
it's the "right" model for all workloads. There's certainly room for a
richer set of physical operators in the Hadoop project, but the nice thing
about Hadoop and Hive is that it's a full suite of storage, data flow
execution, and a higher-level syntax that works today at scale. If you'd
like to try your hand at an implementation of the Dryad model of query
execution over HDFS and underneath HiveQL, that would certainly be an
interesting project.


Regards,
Jeff

On Thu, Oct 15, 2009 at 12:31 AM, Qing Yan <qing...@gmail.com> wrote:

> Hi,
>
>    Has anyone looked into the Microsoft Dryad project?
>
>    Their basic idea is using DAG(connect computational "vertices" with
> communication "edges") to model distributed computing flows. And they have
> something called DryadLINQ which seems to be the Hive equivilent.
>
>      Since the DAG model doesn't distingish inter-job(workflow) and
> intra-job(map/reduce..etc) layer, their approach of doing Query
> translation,Workflow/Job Scheduling,Execution in one box may score better
> optimization and fine tuning opportunties compared to the Hadoop/Hive
> combo.
>
>    Also giving majority of the hard work will be encapsulated and performed
> by the translation/optimizing layer, the simplicity
> beauty of Map/Reduce becomes irrelevant or even hindrance because
> it doesn't permit more generic and flexible
> operations like Dryad does.
>
>
>   Seems M$ got it right this time, at least on paper :-P ...thought?
>
>
>
>  Qing
>
>
>
>
>

Re: Hive vs. DryadLINQ

Reply via email to