Thanks for your answer Alejandro. I am still quite confused actually, because in a complex production environment where several clients run their own workflows independently, It will be very difficult to coordinate everyone's workflow and make sure there is enough map tasks to avoid a dead lock situations. Knowing that workflows can be triggered by different conditions and start at different times, I can't see how this could be achieved easily. Is there really no other ways to avoid deadlocks ?
At the moment, I still see this issue as a strong limitation of Oozie, but maybe I am still missing something. Cheers, Michael -----Message d'origine----- De : Alejandro Abdelnur [mailto:[email protected]] Envoyé : mardi 5 juin 2012 19:11 À : [email protected] Objet : Re: Oozie, PIG and maximum map tasks Hi Michael, If you want to run pig jobs in parallel you have to ensure your cluster has enough task slots for the maximum number of tasks your Pig jobs will run concurrently plus the number of Oozie actions you are running concurrently (each Oozie action requires one task slot). You can increase the number of tasks as you suggested but you have to be careful not to exceed the capacity of your box. After you do that you have add more nodes to your cluster. thx On Tue, Jun 5, 2012 at 1:01 AM, <[email protected]> wrote: > Dear all, > I am new to Oozie and I am facing a problem with my workflows. I am mostly > interested in using Oozie to schedule PIG scripts. > > So far, I managed to run the tutorial examples fine and to create my first > basic workflows to launch PIG actions. > But when I started to play with more complex situations where for example > I need to run two PIG scripts in parallel, the PIG jobs never terminate and > stay in RUNNING status until I finally kill them. > > I am running my workflows on a Cloudera VM and I found that increasing the > number of maximum map tasks definitively solve my problem > (mapred.tasktracker.map.tasks.maximum was set to 2 and I increased it to > 5). Now everything seems to run rather smoothly ! > > However, I am still uncomfortable with this fix because it seems to me > that there is still something wrong. Let say I have > mapred.tasktracker.map.tasks.maximum =N. Let say that I have N workflows > (alternatively N parallel branches in a workflow) that are triggered at the > same time. Then N Oozie jobs will be started, each of them trying to run a > PIG action. Again, on my platform, the system will enter a sort of > deadlock situation because there will be no map tasks available for the PIG > scripts to terminate. > > So my question is : Am I missing something ? Is there a way to avoid such > deadlock situations ? > > > Thanks for your answers, > And please, receive my apologize if my question turns to be completely > silly ... > > Michael > > > > > _________________________________________________________________________________________________________________________ > > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez > recu ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > France Telecom - Orange decline toute responsabilite si ce message a ete > altere, deforme ou falsifie. Merci. > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and > delete this message and its attachments. > As emails may be altered, France Telecom - Orange is not liable for > messages that have been modified, changed or falsified. > Thank you. > > -- Alejandro _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
