> You could use parallel to submit jobs, but its a very bad idea, due to the limitations of the software.
and by that, I mean the limitations of the LSF software. Parallel rocks. Ole rocks. Next Parallel Release should be named "GNU Terry Pratchett" > Let me know which option is better for you. > As with regard to Martin, he should not use parallel for ^-- this is your brain on Perl. Best Regards, George On Wed, Apr 15, 2015 at 11:27 PM, George Marselis <[email protected]> wrote: > Giuseppe, I was referring to both of you. My apologies I was not clear, I > had my head stuck in Perl while writing the first email. > > My suggestion to both of you is that you should not use parallel for your > respective topics. > > Giuseppe, > > You should use an extra script. Your problem is that you are timing out > while trying to submit all those jobs. The timeout happens because of the > number of jobs you are submitting: LSF cannot write the job descriptions > fast enough to disk, times out because the action is not completed and then > stays in that state > > ---------------- > Martin, > > You could use parallel to submit jobs, but its a very bad idea, due to the > limitations of the software. Use batch scripts and job arrays when possible. > > ---------------- > > So, as per my suggestion, I think our discussion is offtopic for this > list. We could continue here, if Ole and the list puts up with us, but I > think we should take this on a personal email or switch this to the Debian > Medical email list https://en.wikipedia.org/wiki/Debian-Med . > > Let me know which option is better for you. > > As with regard to Martin, he should not use parallel for > > > Ciao, > > George > > On Wed, Apr 15, 2015 at 10:50 PM, Giuseppe Aprea <[email protected] > > wrote: > >> Hi George! >> >> I am not sure who you are talking with. Martin or me? I remind the >> original topic is about using blast under parallel with LSF. >> Martin's problem sounds like something offtopic. >> >> You have both sysadmin and bioinformatics experience so I would really >> appreciate your help! >> >> I am working on a cluster so I must use LSF to get slots and I would >> prefer using parallel also since it splits input automatically with >> --recstart (which is quite nice:D otherwise I have to use another script >> for that). I see I could do better with chunksize (I have 1 record at time >> in my example) but that's a secondary problem now. First I have the >> "lsb_launch(): Failed while waiting for tasks to finish." issue to solve. >> >> cheers, >> >> g >> >> >> >> >> On Wed, Apr 15, 2015 at 7:44 PM, George Marselis <[email protected]> >> wrote: >> >>> By the way, LSF and GNU parallel do almost the same thing. So using one >>> of the two, defeats the purpose of using the other. >>> >>> In the same way, you could have used LSF to submit your jobs to LSF: >>> >>> bsub < script.sh >>> >>> where script.sh was >>> >>> bsub -J amoeba -q smalljobs qfasta file1 >>> bsub -J amoeba -q smalljobs qfasta file2 >>> ... >>> bsub -J amoeba -q smalljobs qfasta file2000 >>> >>> On Wed, Apr 15, 2015 at 8:39 PM, George Marselis <[email protected]> >>> wrote: >>> >>>> Hi. LSF/Openlava sysadmin in bioinformatics and parallel user here. >>>> >>>> I have seen this a couple more times: You are trying to use GNU >>>> parallel to submit the jobs to all nodes. >>>> >>>> THat's now the way to do things: You should not submit jobs on *all* >>>> your nodes. Please don't do that, as bsub was not designed to read large >>>> chunks of jobs. bsub writes the jobs to your home directory, so if your >>>> storage is not designed for a lot of writes, you are going to blow the >>>> cluster out of the water. >>>> >>>> What you want to do is look up either: >>>> >>>> 1. bsub scripts >>>> https://rc.fas.harvard.edu/resources/documentation/legacy-lsf/lsf-submit-an-lsf-job/ >>>> >>>> or >>>> >>>> 2. job arrays >>>> https://rc.fas.harvard.edu/resources/documentation/legacy-lsf/lsf-submitting-lots-of-short-jobs-job-arrays/ >>>> >>>> Both bsub scripts and job arrays are useful to you: bsub scripts can be >>>> submitted as part of a pipeline: you can program the output of the bsub >>>> script from your pipeline and then submit it to bsub. So, instead of >>>> submitting your job 2000 times as in >>>> >>>> bsub job0 >>>> bsub job1 >>>> >>>> .... >>>> >>>> bsub job1999 >>>> >>>> you just submit "bsub < scriptname" which contains 2000 lines which >>>> describe your jobs and you are done. The rest is done by bsub/LSF >>>> >>>> >>>> Now, if your jobs are similar in a way that you just increment counter >>>> (as in most bioinformatics jobs), use arrays. >>>> >>>> bsub -J JOBNAME[0-1999], where JOBNAME is a string you would like to >>>> name your job as, eg "fasta files alignment" >>>> >>>> >>>> These techniques are useful because you can submit all 2000 jobs in >>>> less than a second, you can do it from a single node and you will not have >>>> to deal with a grumpy sysadmin or grumpy colleagues who cannot use the >>>> cluster. Just make sure you use the appropriate queue. >>>> >>>> Let me know if you have any questions. >>>> >>>> Best Regards, >>>> >>>> George Marselis >>>> >>>> On Wed, Apr 15, 2015 at 6:48 PM, Martin d'Anjou < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Thanks for clarifying. I want to use GNU Parallel to bsub jobs. This >>>>> way I can use GNU Parallel to throttle the number of jobs that are >>>>> submitted to LSF, and it is easier than writing a loop. >>>>> >>>>> parallel -j 100 my_script [bsub options] ::: {1..2000} >>>>> >>>>> my_script (pseudo-code): >>>>> #!/bin/bash >>>>> ... >>>>> bsub [bsub options] command ... >>>>> post-process data >>>>> >>>>> This way I can submit jobs, say 100 at a time. When I submit all 2000 >>>>> jobs, it gets problematic and I start hitting limits with file >>>>> descriptors, >>>>> etc. >>>>> >>>>> Thanks for sharing, >>>>> Martin >>>>> >>>>> >>>>> On 15-04-15 11:35 AM, Giuseppe Aprea wrote: >>>>> >>>>> Hi Martin, >>>>> >>>>> I am not sure I understand. As far as I can see, things work exactly >>>>> the opposite way: you have an LSF script which launches GNU Parallel on >>>>> some hosts provided by LSF. Something like: >>>>> >>>>> >>>>> ------------------------------------------------------------------------------- >>>>> >>>>> ------------------------------------------------------------------------------- >>>>> #!/bin/bash >>>>> >>>>> #BSUB -J gnuParallel_blast_test # Name of the job. >>>>> #BSUB -o %J.out # Appends std output to >>>>> file %J.out. (%J is the Job ID) >>>>> #BSUB -e %J.err # Appends std error to >>>>> file %J.err. >>>>> #BSUB -q large # Queue name. >>>>> #BSUB -n 30 # Number of CPUs. >>>>> >>>>> module load 4.8.3/ncbi/12.0.0 >>>>> module load 4.8.3/parallel/20150122 >>>>> >>>>> SLOTS=`cat ${LSB_DJOB_HOSTFILE} |wc -l` >>>>> >>>>> SERVER="" >>>>> >>>>> for i in `cat ${LSB_DJOB_HOSTFILE}| sort` >>>>> do >>>>> echo "/afs/enea.it/software/bin/blaunch.sh ${i}" >> servers >>>>> done >>>>> >>>>> cat absolute_path_to_sequences.fasta | parallel --no-notice -vv -j >>>>> ${SLOTS} --slf servers --plain --recstart '>' -N 1 --pipe blastp -evalue >>>>> 1e-05 -outfmt 6 -db absolute_path_to_db_file -query - -out >>>>> absolute_path_to_result_file_{%} >>>>> >>>>> ------------------------------------------------------------------------------- >>>>> >>>>> ------------------------------------------------------------------------------- >>>>> >>>>> LSF is the one which gives you the execution hosts so if you are >>>>> launching bsub from GNU parallel how do you know how to set the --slf >>>>> option? >>>>> >>>>> >>>>> g >>>>> >>>>> >>>>> >>>>> On Wed, Apr 15, 2015 at 4:24 PM, Martin d'Anjou < >>>>> [email protected]> wrote: >>>>> >>>>>> On 15-04-15 09:34 AM, Giuseppe Aprea wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I would like to ask you, please, some help in using parallel with >>>>>>> blast alignment software. >>>>>>> >>>>>>> >>>>>>> I am trying to use GNU parallel v. 20150122 with blast for a very >>>>>>> large sequences alignment. I am using Parallel on a cluster which uses >>>>>>> LSF >>>>>>> as queue system. >>>>>>> >>>>>> >>>>>> Hello Giuseppe, >>>>>> >>>>>> I am an avid LSF user, and I want to use GNU Parallel to dispatch >>>>>> jobs to LSF. Could you please explain a little bit to me how GNU Parallel >>>>>> works with LSF? I do not see it in the on-line tutorials. For example, I >>>>>> would like to understand how to pass "bsub" options like -oo, -q >>>>>> queue_name, etc. to LSF from GNU Parallel. >>>>>> >>>>>> Thanks, >>>>>> Martin >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >
