Hi

I use hadoop for a MapReduce job in my system. I would like to have the job run very 5th minute. Are there any "distributed" timer job stuff in hadoop? Of course I could setup a timer in an external timer framework (CRON or something like that) that invokes the MapReduce job. But CRON is only running on one particular machine, so if that machine goes down my job will not be triggered. Then I could setup the timer on all or many machines, but I would not like the job to be run in more than one instance every 5th minute, so then the timer jobs would need to coordinate who is actually starting the job "this time" and all the rest would just have to do nothing. Guess I could come up with a solution to that - e.g. writing some "lock" stuff using HDFS files or by using ZooKeeper. But I would really like if someone had already solved the problem, and provided some kind of a "distributed timer framework" running in a "cluster", so that I could just register a timer job with the cluster, and then be sure that it is invoked every 5th minute, no matter if one or two particular machines in the cluster is down.

Any suggestions are very welcome.

Regards, Per Steffensen

Reply via email to