Hi Phil, I followed up over on the Github issue, let's track it there and we can reply here for the sake of history once we figure out what's going on.
Thanks, --nate On Thu, Sep 12, 2019 at 12:32 AM Philip Blood <bl...@psc.edu> wrote: > Update: Nate Coraor pointed me to the drmaa-run utility in slurm-drmaa to > do more focused testing, and it looks like the issue with running Slurm > jobs from Galaxy comes down to *slurm-drmaa not working with the latest > version of Slurm 18 -- 18.08.8.* I created an issue on the slurm-drmaa > github page here <https://github.com/natefoo/slurm-drmaa/issues/32>. > > Since 18.08.8 addresses a security vulnerability > <https://www.schedmd.com/news.php> that is not addressed in previous > versions of Slurm, it seems like this slurm-drmaa problem will be an > important issue to address for all those running Galaxy jobs on Slurm > clusters. > > If anyone finds they *can* run jobs via slurm-drmaa with Slurm 18.08.8, I'd > be interested to hear it. > > Phil > > On Tue, Sep 3, 2019 at 2:29 PM Philip Blood <bl...@psc.edu> wrote: > > > Hi Folks, > > > > I'm trying to get an old instance of Galaxy (16.01) working for a user > who > > needs to use it this week for a class he is teaching (so upgrading Galaxy > > is not an option at the moment). Due to a recent slurm upgrade on our > > compute system to slurm 18.08.8, we had to replace the old slurm-drmaa > > 1.0.7 library <http://apps.man.poznan.pl/trac/slurm-drmaa>, which > doesn't > > work with with 18.08.8, with Nate's forked slurm-drmaa library version > > 1.1.0 <https://github.com/natefoo/slurm-drmaa>. That built fine with > > slurm 18.08.8 and (I think) we updated all the relevant pointers in the > > galaxy config to point to the new slurm-drmaa 1.1.0 library. > > > > However, now when I try to run jobs on our system I get errors (it worked > > fine before with slurm-drmaa 1.0.7 and the older version of slurm). So, I > > wanted to get a quick sanity check on whether this might be an issue with > > trying to use the new slurm-drmaa with an old version of Galaxy, 16.01, > or > > if anyone has any other quick thoughts on troubleshooting this. The > errors > > I get are below. > > > > Best, > > Phil > > > > *Short version (just the errors):* > > 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (2): No such file or directory* > > 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All > attempts > > to submit job failed * > > > > > > *Full context:* > > 198.91.54.159 - - [31/Aug/2019:16:30:27 +0000] "GET > > /api/tools/squeue/build HTTP/1.1" 200 - "https://galaxy.bridges.psc.edu/ > " > > "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 > > Firefox/68.0" > > galaxy.tools DEBUG 2019-08-31 16:30:30,142 Validated and populated state > > for tool request (4.081 ms) > > galaxy.tools.actions INFO 2019-08-31 16:30:30,285 Handled output (100.616 > > ms) > > galaxy.tools.actions INFO 2019-08-31 16:30:30,319 Verified access to > > datasets (0.005 ms) > > galaxy.tools.execute DEBUG 2019-08-31 16:30:30,368 Tool [squeue] created > > job [10] (206.086 ms) > > galaxy.tools.execute DEBUG 2019-08-31 16:30:30,376 Executed all jobs for > > tool request: (233.862 ms) > > 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "POST /api/tools HTTP/1.1" > > 200 - "https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; > > Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > galaxy.jobs DEBUG 2019-08-31 16:30:30,747 (10) Working directory for job > > is: /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10 > > galaxy.jobs.handler DEBUG 2019-08-31 16:30:30,751 (10) Dispatching to > > slurm runner > > galaxy.jobs DEBUG 2019-08-31 16:30:30,774 (10) Persisting job > destination > > (destination id: LM4) > > galaxy.jobs.runners DEBUG 2019-08-31 16:30:30,790 Job [10] queued (38.578 > > ms) > > galaxy.jobs.handler INFO 2019-08-31 16:30:30,818 (10) Job dispatched > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,012 Building dependency shell > > command for dependency 'slurm' > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Find dependency slurm > > version None > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Resolver > > tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency > > object at 0x1b38390> (isnull? True) > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,014 Resolver galaxy_packages > > returned > > <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency > object > > at 0x7fc78c334750> (isnull? False) > > galaxy.jobs.command_factory INFO 2019-08-31 16:30:31,057 Built script > > > [/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh] > > for tool > > > command[PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/slurm/18.08.8; > > export PACKAGE_BASE; . /opt/packages/galaxy/galaxy0 > > 1/tool_dependencies/slurm/18.08.8/env.sh; echo "hostname:" > output; echo > > " " >> output; hostname >> output; echo " " >> output; env >> output; > echo > > " " >> output; date >> output; echo " " >> output; echo "Uptime:" >> > > output; echo " " >> output; uptime >> output; echo " " >> output; echo > > "Module > > s:" >> output; echo " " >> output; module avail >> output 2>&1; echo " " > > >> output; echo "SLURM Queue Status" >> output; echo " " >> output; echo > > "If your job is running on the queues, it will be listed in the reports > > below:" >> output; echo " " >> output; echo " " >> output; echo "Normal > > Report: s > > queue" >> output; echo " " >> output; echo " " >> output; echo " " >> > > output; squeue >> output; date >> output; echo " " >> output; echo " " > >> > > output; echo "*** Full Report: squeue -l ***" >> output; echo " " >> > > output; squeue -l >> output; echo " " >> output; echo " " >> output; > date > > >> output; > > echo " " >> output; echo "Local: ${LOCAL}" >> output; echo "Ramdisk: > > ${RAMDISK}" >> output; workdir=`pwd`; echo "workdir is $workdir" >> > output; > > cd $LOCAL; echo "i am in `pwd`" >> $workdir/output; cd $workdir; echo "i > am > > in `pwd`" >> output; date >> output; echo " " >> output] > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Building dependency shell > > command for dependency 'samtools' > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Find dependency samtools > > version None > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Resolver > > tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency > > object at 0x1b38390> (isnull? True) > > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,279 Resolver galaxy_packages > > returned > > <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency > object > > at 0x7fc7b01e01d0> (isnull? False) > > galaxy.jobs.runners DEBUG 2019-08-31 16:30:31,284 (10) command is: > > > /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh; > > return_code=$?; if [ -f > > > /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/output > > ] ; then cp /opt/packages/galaxy/galaxy01/databas > > e/job_working_directory/000/10/output > > /opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat ; fi; > > > PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19; > > export PACKAGE_BASE; . > > /opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19/env.sh; > > python "/opt/packa > > > ges/galaxy/galaxy01/database/job_working_directory/000/10/set_metadata_jGYkkM.py" > > "/opt/packages/galaxy/galaxy01/tmp/tmpxmB5GA" > > > "/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy.json" > > > "/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_in_HistoryD > > > > > atasetAssociation_10_u2k7qq,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_kwds_HistoryDatasetAssociation_10_2ZmSXR,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_out_HistoryDatasetAssociation_10_shC78c,/opt/packages/galaxy/galaxy01/databa > > > se/job_working_directory/000/10/metadata_results_HistoryDatasetAssociation_10_57x96D,/opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_override_HistoryDatasetAssociation_10_AEeOfc" > > 5242880; sh -c "exit $retur > > n_code" > > galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) submitting > > file > > > /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy_10.sh > > galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) native > > specification is: -p LM -C LM -N 1 -n 4 --ntasks-per-node=4 --mem=192500 > -t > > 24:00:00 > > 198.91.54.159 - - [31/Aug/2019:16:30:34 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:30:38 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:30:42 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:30:47 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:30:51 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:30:55 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:30:59 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:03 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:08 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:12 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:16 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:20 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:24 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (2): No such file or directory* > > 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) > > drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job > > error (0): No error* > > 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET > > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " > > https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; > > x64; rv:68.0) Gecko/20100101 Firefox/68.0" > > *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All > attempts > > to submit job failed * > > > ___________________________________________________________ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > %(web_page_url)s > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/ > ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/