[jira] [Commented] (CRUNCH-441) Crunch on Tez

David Whiting (JIRA) Wed, 23 Jul 2014 05:13:55 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071647#comment-14071647
 ]


David Whiting commented on CRUNCH-441:
--------------------------------------

After two of us spending a full day on it, we determined the following:

- Tez DAGs map reasonably well to the Graphs build by Crunch's MapReduce 
implementation, there's no reason why thing shouldn't be possible in theory.
- Tez's API is much lower level than we expected, meaning that the 
implementation might well be more complex than we anticipated. It does appear 
to have a slightly higher-level Map-Reduce-Reduce implementation which could 
make an easier transition for Crunch, but this was difficult to find 
information about.
- The Tez API does not yet seem to be particularly stable right now.

We probably won't have time to look at this again for a while, so if anyone 
wants to take the baton it'd be really great. There are presumably 
implementaitons around for similar things (such is in the Hive source and in a 
Cascading branch somewhere) that could be used for reference; otherwise maybe 
we'll take another look when the API and docs seem a bit more stable and 
complete.

> Crunch on Tez
> -------------
>
>                 Key: CRUNCH-441
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-441
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: David Whiting
>
> Tez is potentially a better drop-in replacement for MR than Spark on many 
> existing Hadoop environments, because it doesn't require always-on resources 
> and is less memory-hungry than Spark whilst still providing huge performance 
> gains as can be seen in new versions of Hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CRUNCH-441) Crunch on Tez

Reply via email to