Re: RE: [DISCUSS] Replacing MapReduce with Tez

Lewis John McGibbney Mon, 21 Dec 2020 19:59:31 -0800

Hi Markus,
Thanks for chiming in :)
My responses below

On 2020/12/21 21:32:08, Markus Jelsma <[email protected]> wrote: 
> Hello Lewis,
> 
> 1. counters, for me they are a requirement to have as they are key to regular 
> inspections of ongoing crawls, finding errors and debugging. I hope you can 
> find a work around.


I totally agree. Please see the observed issues I documented at 
https://cwiki.apache.org/confluence/display/NUTCH/Running+Nutch+on+Tez#RunningNutchonTez-ObservedIssues

> 
> 2. sounds interesting, but i'd like to see the test run with 12M rather than 
> 12k URLs.

Please see 
https://cwiki.apache.org/confluence/display/NUTCH/Running+Nutch+on+Tez#RunningNutchonTez-RunningtheInjectorjobonTez

> 
> A question, are the produced files with Tez compatible with MapReduce 
> programs, map and sequence files?

Having consulted with the Tez Committers (https://s.apache.org/aiw8o) it 
appears that there may be some unpopular MapReduce features which are not 
supported by Tez yet however I have yet to encounter any issues along those 
lines.

> It would be a tremendous advantage if existing programs can work with it. 

I agree... so far the results look promising.

> It would be a real pain to have to rewrite all code in one go. We have seen 
> that lead to a dead end many times, including our 2.x-branch.

Yes I'm intrigued to see how things progress. Although I am still not sure 100% 
on what code re-writing would be required. I am still learning more about how 
our MapReduce jobs would be natively written using the Tez DAG API.

Re: RE: [DISCUSS] Replacing MapReduce with Tez

Reply via email to