Re: [galaxy-dev] job status when SGE kills/aborts job
On Jul 30, 2011, at 4:58 AM, Peter Cock wrote: > > On Saturday, July 30, 2011, Shantanu Pavgi wrote: > > > > The -V option is not for verbose mode but for exporting your > > shell environment. Refer to qsub manual for details: > > "Specifies that all environment variables active within the > > qsub utility be exported to the context of the job." > > We are already using it in our configuration as needed. > > That is such a common need it would be great to have > it in the Galaxy documentation as an example of using > native SGE options with drmaa:// in universe_wsgi.ini > Plus the http://linux.die.net/man/5/sge_complex link. > > Thanks! > > > I think we are having problem with the galaxy (or drmaa > > Python lib) parsing correct drmaa/SGE messages and > > not with the drmaa URL configuration. Thoughts? > > I'd try adding a few debug log/print statements to the > code to try and diagnose it. > > Peter The option is the same for Torque/PBS as well. We'll have the chance (or misfortune depending on how you look at it) of testing both SGE and Torque locally. chris ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] job status when SGE kills/aborts job
On Saturday, July 30, 2011, Shantanu Pavgi wrote: > > The -V option is not for verbose mode but for exporting your > shell environment. Refer to qsub manual for details: > "Specifies that all environment variables active within the > qsub utility be exported to the context of the job." > We are already using it in our configuration as needed. That is such a common need it would be great to have it in the Galaxy documentation as an example of using native SGE options with drmaa:// in universe_wsgi.ini Plus the http://linux.die.net/man/5/sge_complex link. Thanks! > I think we are having problem with the galaxy (or drmaa > Python lib) parsing correct drmaa/SGE messages and > not with the drmaa URL configuration. Thoughts? I'd try adding a few debug log/print statements to the code to try and diagnose it. Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] job status when SGE kills/aborts job
On Jul 29, 2011, at 8:03 PM, ambarish biswas wrote: With Regards, Ambarish Biswas, University of Otago Department of Biochemistry, Dunedin, New Zealand, Tel: +64(22)0855647 Fax: +64(0)3 479 7866 Hi have u tested the option drmaa://-q galaxy -V/ option yet? Here as it suggests, -q galaxy creats a queue name with galaxy, not sure what V stands for, but could be verbose The -V option is not for verbose mode but for exporting your shell environment. Refer to qsub manual for details: "Specifies that all environment variables active within the qsub utility be exported to the context of the job." We are already using it in our configuration as needed. I think we are having problem with the galaxy (or drmaa Python lib) parsing correct drmaa/SGE messages and not with the drmaa URL configuration. Thoughts? -- Shantanu. On Sat, Jul 30, 2011 at 9:13 AM, Ka Ming Nip mailto:km...@bcgsc.ca>> wrote: Hi Shantanu, I am also using a SGE cluster and the DRMAA runner for my Galaxy install. I am also having the same issue for jobs that were killed. How did you define the run-time or memory/runtime configurations in your DRMAA URLs? I had to add "-w n" in the DRMAA URLs in order for my jobs to be dispatched to the cluster. However, someone said (on another thread) that doing so might hide the errors. I am not sure if this is the cause since my jobs won't be dispatched at all if "-w n" was not in the DRMAA URLs. Ka Ming From: galaxy-dev-boun...@lists.bx.psu.edu<mailto:galaxy-dev-boun...@lists.bx.psu.edu> [galaxy-dev-boun...@lists.bx.psu.edu<mailto:galaxy-dev-boun...@lists.bx.psu.edu>] On Behalf Of Shantanu Pavgi [pa...@uab.edu<mailto:pa...@uab.edu>] Sent: July 29, 2011 1:56 PM To: galaxydev psu Subject: [galaxy-dev] job status when SGE kills/aborts job We are using SGE cluster with our galaxy install. We have specified resource and run-time limits for certain tools using tool specific drmaa URL configuration, e.g.: - run-time (h_rt, s_rt) - memory (vf, h_vmem). This helps scheduler in submitting jobs to an appropriate node and also prevent node from crashing because of excessive memory consumption. However, sometimes a job needs more resources and/or run-time than specified in the drmaa URL configuration. In such cases SGE kills particular job and we get email notification with appropriate job summary. However, the galaxy web interface doesn't show any error for such failures. The job table doesn't contain any related state/info as well. The jobs are shown in green-boxes meaning they completed without any failure. In reality these jobs have been killed/aborted by the scheduler. This is really confusing as there is inconsistency between job status indicated by the galaxy and SGE/drmaa. Has anyone else experienced and/or addressed this issue? Any comments or suggestions will be really helpful. Thanks, Shantanu. ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] job status when SGE kills/aborts job
On Jul 29, 2011, at 4:13 PM, Ka Ming Nip wrote: Hi Shantanu, I am also using a SGE cluster and the DRMAA runner for my Galaxy install. I am also having the same issue for jobs that were killed. How did you define the run-time or memory/runtime configurations in your DRMAA URLs? I had to add "-w n" in the DRMAA URLs in order for my jobs to be dispatched to the cluster. However, someone said (on another thread) that doing so might hide the errors. I am not sure if this is the cause since my jobs won't be dispatched at all if "-w n" was not in the DRMAA URLs. Ka Ming The drmaa/SGE URL in our configuration looks something like this: {{{ drmaa:// -V -m be -M -l vf=,h_rt=,s_rt=,h_vmem= / }}} We don't use "-w n" option in our configuration. The "-w n" will turn off validation of your job script. Refer to qsub manual for details. The -l options (complex configuration options) can be found here: http://linux.die.net/man/5/sge_complex . Hope this helps you. -- Shantanu. From: galaxy-dev-boun...@lists.bx.psu.edu<mailto:galaxy-dev-boun...@lists.bx.psu.edu> [galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Shantanu Pavgi [pa...@uab.edu] Sent: July 29, 2011 1:56 PM To: galaxydev psu Subject: [galaxy-dev] job status when SGE kills/aborts job We are using SGE cluster with our galaxy install. We have specified resource and run-time limits for certain tools using tool specific drmaa URL configuration, e.g.: - run-time (h_rt, s_rt) - memory (vf, h_vmem). This helps scheduler in submitting jobs to an appropriate node and also prevent node from crashing because of excessive memory consumption. However, sometimes a job needs more resources and/or run-time than specified in the drmaa URL configuration. In such cases SGE kills particular job and we get email notification with appropriate job summary. However, the galaxy web interface doesn't show any error for such failures. The job table doesn't contain any related state/info as well. The jobs are shown in green-boxes meaning they completed without any failure. In reality these jobs have been killed/aborted by the scheduler. This is really confusing as there is inconsistency between job status indicated by the galaxy and SGE/drmaa. Has anyone else experienced and/or addressed this issue? Any comments or suggestions will be really helpful. Thanks, Shantanu. ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] job status when SGE kills/aborts job
With Regards, Ambarish Biswas, University of Otago Department of Biochemistry, Dunedin, New Zealand, Tel: +64(22)0855647 Fax: +64(0)3 479 7866 Hi have u tested the option *drmaa://-q galaxy -V/ *option yet? Here as it suggests, -q galaxy creats a queue name with galaxy, not sure what V stands for, but could be verbose On Sat, Jul 30, 2011 at 9:13 AM, Ka Ming Nip wrote: > Hi Shantanu, > > I am also using a SGE cluster and the DRMAA runner for my Galaxy install. I > am also having the same issue for jobs that were killed. > > How did you define the run-time or memory/runtime configurations in your > DRMAA URLs? > > I had to add "-w n" in the DRMAA URLs in order for my jobs to be dispatched > to the cluster. However, someone said (on another thread) that doing so > might hide the errors. I am not sure if this is the cause since my jobs > won't be dispatched at all if "-w n" was not in the DRMAA URLs. > > Ka Ming > > From: galaxy-dev-boun...@lists.bx.psu.edu [ > galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Shantanu Pavgi [ > pa...@uab.edu] > Sent: July 29, 2011 1:56 PM > To: galaxydev psu > Subject: [galaxy-dev] job status when SGE kills/aborts job > > We are using SGE cluster with our galaxy install. We have specified > resource and run-time limits for certain tools using tool specific drmaa URL > configuration, e.g.: > - run-time (h_rt, s_rt) > - memory (vf, h_vmem). > > This helps scheduler in submitting jobs to an appropriate node and also > prevent node from crashing because of excessive memory consumption. However, > sometimes a job needs more resources and/or run-time than specified in the > drmaa URL configuration. In such cases SGE kills particular job and we get > email notification with appropriate job summary. However, the galaxy web > interface doesn't show any error for such failures. The job table doesn't > contain any related state/info as well. The jobs are shown in green-boxes > meaning they completed without any failure. In reality these jobs have been > killed/aborted by the scheduler. This is really confusing as there is > inconsistency between job status indicated by the galaxy and SGE/drmaa. Has > anyone else experienced and/or addressed this issue? Any comments or > suggestions will be really helpful. > > Thanks, > Shantanu. > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ > > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] job status when SGE kills/aborts job
Hi Shantanu, I am also using a SGE cluster and the DRMAA runner for my Galaxy install. I am also having the same issue for jobs that were killed. How did you define the run-time or memory/runtime configurations in your DRMAA URLs? I had to add "-w n" in the DRMAA URLs in order for my jobs to be dispatched to the cluster. However, someone said (on another thread) that doing so might hide the errors. I am not sure if this is the cause since my jobs won't be dispatched at all if "-w n" was not in the DRMAA URLs. Ka Ming From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Shantanu Pavgi [pa...@uab.edu] Sent: July 29, 2011 1:56 PM To: galaxydev psu Subject: [galaxy-dev] job status when SGE kills/aborts job We are using SGE cluster with our galaxy install. We have specified resource and run-time limits for certain tools using tool specific drmaa URL configuration, e.g.: - run-time (h_rt, s_rt) - memory (vf, h_vmem). This helps scheduler in submitting jobs to an appropriate node and also prevent node from crashing because of excessive memory consumption. However, sometimes a job needs more resources and/or run-time than specified in the drmaa URL configuration. In such cases SGE kills particular job and we get email notification with appropriate job summary. However, the galaxy web interface doesn't show any error for such failures. The job table doesn't contain any related state/info as well. The jobs are shown in green-boxes meaning they completed without any failure. In reality these jobs have been killed/aborted by the scheduler. This is really confusing as there is inconsistency between job status indicated by the galaxy and SGE/drmaa. Has anyone else experienced and/or addressed this issue? Any comments or suggestions will be really helpful. Thanks, Shantanu. ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] job status when SGE kills/aborts job
We are using SGE cluster with our galaxy install. We have specified resource and run-time limits for certain tools using tool specific drmaa URL configuration, e.g.: - run-time (h_rt, s_rt) - memory (vf, h_vmem). This helps scheduler in submitting jobs to an appropriate node and also prevent node from crashing because of excessive memory consumption. However, sometimes a job needs more resources and/or run-time than specified in the drmaa URL configuration. In such cases SGE kills particular job and we get email notification with appropriate job summary. However, the galaxy web interface doesn't show any error for such failures. The job table doesn't contain any related state/info as well. The jobs are shown in green-boxes meaning they completed without any failure. In reality these jobs have been killed/aborted by the scheduler. This is really confusing as there is inconsistency between job status indicated by the galaxy and SGE/drmaa. Has anyone else experienced and/or addressed this issue? Any comments or suggestions will be really helpful. Thanks, Shantanu. ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/