[
https://issues.apache.org/jira/browse/CRUNCH-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071647#comment-14071647
]
David Whiting commented on CRUNCH-441:
--------------------------------------
After two of us spending a full day on it, we determined the following:
- Tez DAGs map reasonably well to the Graphs build by Crunch's MapReduce
implementation, there's no reason why thing shouldn't be possible in theory.
- Tez's API is much lower level than we expected, meaning that the
implementation might well be more complex than we anticipated. It does appear
to have a slightly higher-level Map-Reduce-Reduce implementation which could
make an easier transition for Crunch, but this was difficult to find
information about.
- The Tez API does not yet seem to be particularly stable right now.
We probably won't have time to look at this again for a while, so if anyone
wants to take the baton it'd be really great. There are presumably
implementaitons around for similar things (such is in the Hive source and in a
Cascading branch somewhere) that could be used for reference; otherwise maybe
we'll take another look when the API and docs seem a bit more stable and
complete.
> Crunch on Tez
> -------------
>
> Key: CRUNCH-441
> URL: https://issues.apache.org/jira/browse/CRUNCH-441
> Project: Crunch
> Issue Type: Improvement
> Reporter: David Whiting
>
> Tez is potentially a better drop-in replacement for MR than Spark on many
> existing Hadoop environments, because it doesn't require always-on resources
> and is less memory-hungry than Spark whilst still providing huge performance
> gains as can be seen in new versions of Hive.
--
This message was sent by Atlassian JIRA
(v6.2#6252)