Ok, That's explains a lot! Thanks guys! :) 2011/9/29 Joey Echeverria <j...@cloudera.com>
> > The question is: the intermediary (before any reducer) results of > completed > > individual tasks are recorded in the HDFS, right? So why are these > results > > discarded, since the lost of the tasktracker is not the lost of already > > processed data? > > Intermediate results are stored on the local disks and served up via > an embedded jetty HTTP server. If the tasktracker goes down, so does > the embedded HTTP server. > > -Joey > > On Thu, Sep 29, 2011 at 12:59 PM, Leonardo Gamas > <leoga...@jusbrasil.com.br> wrote: > > No, the reducers are fine, or at least i didn't observe any problem. > > > > The question is: the intermediary (before any reducer) results of > completed > > individual tasks are recorded in the HDFS, right? So why are these > results > > discarded, since the lost of the tasktracker is not the lost of already > > processed data? > > > > --Leonardo Gamas > > > > 2011/9/29 Robert Evans <ev...@yahoo-inc.com> > >> > >> If a TaskTracker is lost then it cannot serve up any Map results to > >> Reducers that will need them so the Map tasks have to be rerun. I am > not > >> sure if this is the behavior you are seeing or not. Are completed > Reducers > >> being rerun as well? > >> > >> --Bobby Evans > >> > >> On 9/29/11 11:15 AM, "Leonardo Gamas" <leoga...@jusbrasil.com.br> > wrote: > >> > >> Hi, > >> > >> I have a very large MapReduce Job and sometimes a TaskTracker doesn't > send > >> a heartbeat in the preconfigured amount of time, so it's considered > dead. > >> It's ok, but all tasks already finished by this TaskTracker are lost > too, or > >> better explained, are rescheduled and re-executed by another > TaskTracker. > >> > >> This is a default behavior or i'm experiencing some bug or miss > >> configuration? > >> > >> My reguards, > >> > >> Leonardo Gamas > >> > >> > > > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >