Hi
I use hadoop for a MapReduce job in my system. I would like to have the
job run very 5th minute. Are there any "distributed" timer job stuff in
hadoop? Of course I could setup a timer in an external timer framework
(CRON or something like that) that invokes the MapReduce job. But CRON
is only running on one particular machine, so if that machine goes down
my job will not be triggered. Then I could setup the timer on all or
many machines, but I would not like the job to be run in more than one
instance every 5th minute, so then the timer jobs would need to
coordinate who is actually starting the job "this time" and all the rest
would just have to do nothing. Guess I could come up with a solution to
that - e.g. writing some "lock" stuff using HDFS files or by using
ZooKeeper. But I would really like if someone had already solved the
problem, and provided some kind of a "distributed timer framework"
running in a "cluster", so that I could just register a timer job with
the cluster, and then be sure that it is invoked every 5th minute, no
matter if one or two particular machines in the cluster is down.
Any suggestions are very welcome.
Regards, Per Steffensen