Hi Manoj,
Reply inline.
On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu manoj...@gmail.com wrote:
Hi All,
Normal Hadoop job submission process involves:
Checking the input and output specifications of the job.
Computing the InputSplits for the job.
Setup the requisite accounting information
Hi,
I have an HDFS folder and M/R job that periodically updates it by replacing the
data with newly generated data.
I have a different M/R job that periodically or ad-hoc process the data in the
folder.
The second job ,naturally, fails sometime, when the data is replaced by newly
generated
How about introducing a distributed coordination and locking mechanism?
ZooKeeper would be a good candidate for that kind of thing.
On Mon, Aug 13, 2012 at 12:52 PM, David Ginzburg ginz...@hotmail.comwrote:
Hi,
I have an HDFS folder and M/R job that periodically updates it by
replacing the
Hi Harsh,
Thanks for your reply.
Consider from my main program i am doing so
many activities(Reading/writing/updating non hadoop activities) before
invoking JobClient.runJob(conf);
Is it anyway to separate the process flow by programmatic instead of going
for any workflow engine?
Cheers!
Manoj.
Sure, you may separate the logic as you want it to be, but just ensure
the configuration object has a proper setJar or setJarByClass done on
it before you submit the job.
On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu manoj...@gmail.com wrote:
Hi Harsh,
Thanks for your reply.
Consider from my
David,
While ZK can solve this, locking may only make you slower. Lets try to
keep it simple?
Have you considered keeping two directories? One where the older data
is moved to (by the first job, instead of replacing files), for
consumption by the second job, which triggers by watching this