In Hadoop, if the client that triggers the job fails, is there a way to recover and another client to submit the job?
On Thu, Sep 1, 2011 at 8:44 PM, Per Steffensen <st...@designware.dk> wrote: > Well I am not sure I get you right, but anyway, basically I want a timer > framework that triggers my jobs. And the triggering of the jobs need to work > even though one or two particular machines goes down. So the "timer > triggering mechanism" has to live in the cluster, so to speak. What I dont > want is that the timer framework are driven from one particular machine, so > that the triggering of jobs will not happen if this particular machine goes > down. Basically if I have e.g. 10 machines in a Hadoop cluster I will be > able to run e.g. MapReduce jobs even if 3 of the 10 machines are down. I > want my timer framework to also be clustered, distributed and coordinated, > so that I will also have my timer jobs triggered even though 3 out of 10 > machines are down. > > > Regards, Per Steffensen > > Ronen Itkin skrev: > >> If I get you right you are asking about Installing Oozie as Distributed >> and/or HA cluster?! >> In that case I am not familiar with an out of the box solution by Oozie. >> But, I think you can made up a solution of your own, for example: >> Installing Oozie on two servers on the same partition which will be >> synchronized by DRBD. >> You can trigger a "failover" using linux Heartbeat and that way maintain a >> virtual IP. >> >> >> >> >> >> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk> >> wrote: >> >> >> >>> Hi >>> >>> Thanks a lot for pointing me to Oozie. I have looked a little bit into >>> Oozie and it seems like the "component" triggering jobs is called >>> "Coordinator Application". But I really see nowhere that this Coordinator >>> Application doesnt just run on a single machine, and that it will >>> therefore >>> not trigger anything if this machine is down. Can you confirm that the >>> "Coordinator Application"-role is distributed in a distribued Oozie >>> setup, >>> so that jobs gets triggered even if one or two machines are down? >>> >>> Regards, Per Steffensen >>> >>> Ronen Itkin skrev: >>> >>> Hi >>> >>> >>>> Try to use Oozie for job coordination and work flows. >>>> >>>> >>>> >>>> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk> >>>> wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Hi >>>>> >>>>> I use hadoop for a MapReduce job in my system. I would like to have the >>>>> job >>>>> run very 5th minute. Are there any "distributed" timer job stuff in >>>>> hadoop? >>>>> Of course I could setup a timer in an external timer framework (CRON or >>>>> something like that) that invokes the MapReduce job. But CRON is only >>>>> running on one particular machine, so if that machine goes down my job >>>>> will >>>>> not be triggered. Then I could setup the timer on all or many machines, >>>>> but >>>>> I would not like the job to be run in more than one instance every 5th >>>>> minute, so then the timer jobs would need to coordinate who is actually >>>>> starting the job "this time" and all the rest would just have to do >>>>> nothing. >>>>> Guess I could come up with a solution to that - e.g. writing some >>>>> "lock" >>>>> stuff using HDFS files or by using ZooKeeper. But I would really like >>>>> if >>>>> someone had already solved the problem, and provided some kind of a >>>>> "distributed timer framework" running in a "cluster", so that I could >>>>> just >>>>> register a timer job with the cluster, and then be sure that it is >>>>> invoked >>>>> every 5th minute, no matter if one or two particular machines in the >>>>> cluster >>>>> is down. >>>>> >>>>> Any suggestions are very welcome. >>>>> >>>>> Regards, Per Steffensen >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >> >> > > -- Regards, Tharindu