Hi George! I am not sure who you are talking with. Martin or me? I remind the original topic is about using blast under parallel with LSF. Martin's problem sounds like something offtopic.
You have both sysadmin and bioinformatics experience so I would really appreciate your help! I am working on a cluster so I must use LSF to get slots and I would prefer using parallel also since it splits input automatically with --recstart (which is quite nice:D otherwise I have to use another script for that). I see I could do better with chunksize (I have 1 record at time in my example) but that's a secondary problem now. First I have the "lsb_launch(): Failed while waiting for tasks to finish." issue to solve. cheers, g On Wed, Apr 15, 2015 at 7:44 PM, George Marselis <[email protected]> wrote: > By the way, LSF and GNU parallel do almost the same thing. So using one of > the two, defeats the purpose of using the other. > > In the same way, you could have used LSF to submit your jobs to LSF: > > bsub < script.sh > > where script.sh was > > bsub -J amoeba -q smalljobs qfasta file1 > bsub -J amoeba -q smalljobs qfasta file2 > ... > bsub -J amoeba -q smalljobs qfasta file2000 > > On Wed, Apr 15, 2015 at 8:39 PM, George Marselis <[email protected]> wrote: > >> Hi. LSF/Openlava sysadmin in bioinformatics and parallel user here. >> >> I have seen this a couple more times: You are trying to use GNU parallel >> to submit the jobs to all nodes. >> >> THat's now the way to do things: You should not submit jobs on *all* your >> nodes. Please don't do that, as bsub was not designed to read large chunks >> of jobs. bsub writes the jobs to your home directory, so if your storage is >> not designed for a lot of writes, you are going to blow the cluster out of >> the water. >> >> What you want to do is look up either: >> >> 1. bsub scripts >> https://rc.fas.harvard.edu/resources/documentation/legacy-lsf/lsf-submit-an-lsf-job/ >> >> or >> >> 2. job arrays >> https://rc.fas.harvard.edu/resources/documentation/legacy-lsf/lsf-submitting-lots-of-short-jobs-job-arrays/ >> >> Both bsub scripts and job arrays are useful to you: bsub scripts can be >> submitted as part of a pipeline: you can program the output of the bsub >> script from your pipeline and then submit it to bsub. So, instead of >> submitting your job 2000 times as in >> >> bsub job0 >> bsub job1 >> >> .... >> >> bsub job1999 >> >> you just submit "bsub < scriptname" which contains 2000 lines which >> describe your jobs and you are done. The rest is done by bsub/LSF >> >> >> Now, if your jobs are similar in a way that you just increment counter >> (as in most bioinformatics jobs), use arrays. >> >> bsub -J JOBNAME[0-1999], where JOBNAME is a string you would like to >> name your job as, eg "fasta files alignment" >> >> >> These techniques are useful because you can submit all 2000 jobs in less >> than a second, you can do it from a single node and you will not have to >> deal with a grumpy sysadmin or grumpy colleagues who cannot use the >> cluster. Just make sure you use the appropriate queue. >> >> Let me know if you have any questions. >> >> Best Regards, >> >> George Marselis >> >> On Wed, Apr 15, 2015 at 6:48 PM, Martin d'Anjou < >> [email protected]> wrote: >> >>> Hi, >>> >>> Thanks for clarifying. I want to use GNU Parallel to bsub jobs. This way >>> I can use GNU Parallel to throttle the number of jobs that are submitted to >>> LSF, and it is easier than writing a loop. >>> >>> parallel -j 100 my_script [bsub options] ::: {1..2000} >>> >>> my_script (pseudo-code): >>> #!/bin/bash >>> ... >>> bsub [bsub options] command ... >>> post-process data >>> >>> This way I can submit jobs, say 100 at a time. When I submit all 2000 >>> jobs, it gets problematic and I start hitting limits with file descriptors, >>> etc. >>> >>> Thanks for sharing, >>> Martin >>> >>> >>> On 15-04-15 11:35 AM, Giuseppe Aprea wrote: >>> >>> Hi Martin, >>> >>> I am not sure I understand. As far as I can see, things work exactly >>> the opposite way: you have an LSF script which launches GNU Parallel on >>> some hosts provided by LSF. Something like: >>> >>> >>> ------------------------------------------------------------------------------- >>> >>> ------------------------------------------------------------------------------- >>> #!/bin/bash >>> >>> #BSUB -J gnuParallel_blast_test # Name of the job. >>> #BSUB -o %J.out # Appends std output to >>> file %J.out. (%J is the Job ID) >>> #BSUB -e %J.err # Appends std error to >>> file %J.err. >>> #BSUB -q large # Queue name. >>> #BSUB -n 30 # Number of CPUs. >>> >>> module load 4.8.3/ncbi/12.0.0 >>> module load 4.8.3/parallel/20150122 >>> >>> SLOTS=`cat ${LSB_DJOB_HOSTFILE} |wc -l` >>> >>> SERVER="" >>> >>> for i in `cat ${LSB_DJOB_HOSTFILE}| sort` >>> do >>> echo "/afs/enea.it/software/bin/blaunch.sh ${i}" >> servers >>> done >>> >>> cat absolute_path_to_sequences.fasta | parallel --no-notice -vv -j >>> ${SLOTS} --slf servers --plain --recstart '>' -N 1 --pipe blastp -evalue >>> 1e-05 -outfmt 6 -db absolute_path_to_db_file -query - -out >>> absolute_path_to_result_file_{%} >>> >>> ------------------------------------------------------------------------------- >>> >>> ------------------------------------------------------------------------------- >>> >>> LSF is the one which gives you the execution hosts so if you are >>> launching bsub from GNU parallel how do you know how to set the --slf >>> option? >>> >>> >>> g >>> >>> >>> >>> On Wed, Apr 15, 2015 at 4:24 PM, Martin d'Anjou < >>> [email protected]> wrote: >>> >>>> On 15-04-15 09:34 AM, Giuseppe Aprea wrote: >>>> >>>>> Hi all, >>>>> >>>>> I would like to ask you, please, some help in using parallel with >>>>> blast alignment software. >>>>> >>>>> >>>>> I am trying to use GNU parallel v. 20150122 with blast for a very >>>>> large sequences alignment. I am using Parallel on a cluster which uses LSF >>>>> as queue system. >>>>> >>>> >>>> Hello Giuseppe, >>>> >>>> I am an avid LSF user, and I want to use GNU Parallel to dispatch jobs >>>> to LSF. Could you please explain a little bit to me how GNU Parallel works >>>> with LSF? I do not see it in the on-line tutorials. For example, I would >>>> like to understand how to pass "bsub" options like -oo, -q queue_name, etc. >>>> to LSF from GNU Parallel. >>>> >>>> Thanks, >>>> Martin >>>> >>>> >>>> >>> >>> >> >
