Hi Markus, Thanks for chiming in :) My responses below On 2020/12/21 21:32:08, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello Lewis, > > 1. counters, for me they are a requirement to have as they are key to regular > inspections of ongoing crawls, finding errors and debugging. I hope you can > find a work around.
I totally agree. Please see the observed issues I documented at https://cwiki.apache.org/confluence/display/NUTCH/Running+Nutch+on+Tez#RunningNutchonTez-ObservedIssues > > 2. sounds interesting, but i'd like to see the test run with 12M rather than > 12k URLs. Please see https://cwiki.apache.org/confluence/display/NUTCH/Running+Nutch+on+Tez#RunningNutchonTez-RunningtheInjectorjobonTez > > A question, are the produced files with Tez compatible with MapReduce > programs, map and sequence files? Having consulted with the Tez Committers (https://s.apache.org/aiw8o) it appears that there may be some unpopular MapReduce features which are not supported by Tez yet however I have yet to encounter any issues along those lines. > It would be a tremendous advantage if existing programs can work with it. I agree... so far the results look promising. > It would be a real pain to have to rewrite all code in one go. We have seen > that lead to a dead end many times, including our 2.x-branch. Yes I'm intrigued to see how things progress. Although I am still not sure 100% on what code re-writing would be required. I am still learning more about how our MapReduce jobs would be natively written using the Tez DAG API.