> You could use parallel to submit jobs, but its a very bad idea, due to
the limitations of the software.


and by that, I mean the limitations of the LSF software.

Parallel rocks. Ole rocks.

Next Parallel Release should be named "GNU Terry Pratchett"

> Let me know which option is better for you.
> As with regard to Martin, he should not use parallel for

^-- this is your brain on Perl.

Best Regards,

George


On Wed, Apr 15, 2015 at 11:27 PM, George Marselis <[email protected]> wrote:

> Giuseppe, I was referring to both of you. My apologies I was not clear, I
> had my head stuck in Perl while writing the first email.
>
> My suggestion to both of you is that you should not use parallel for your
> respective topics.
>
> Giuseppe,
>
> You should use an extra script. Your problem is that you are timing out
> while trying to submit all those jobs. The timeout happens because of the
> number of jobs you are submitting: LSF cannot write the job descriptions
> fast enough to disk, times out because the action is not completed and then
> stays in that state
>
> ----------------
> Martin,
>
> You could use parallel to submit jobs, but its a very bad idea, due to the
> limitations of the software. Use batch scripts and job arrays when possible.
>
> ----------------
>
> So, as per my suggestion, I think our discussion is offtopic for this
> list. We could continue here, if Ole and the list puts up with us, but I
> think we should take this on a personal email or switch this to the Debian
> Medical email list https://en.wikipedia.org/wiki/Debian-Med .
>
> Let me know which option is better for you.
>
> As with regard to Martin, he should not use parallel for
>
>
> Ciao,
>
> George
>
> On Wed, Apr 15, 2015 at 10:50 PM, Giuseppe Aprea <[email protected]
> > wrote:
>
>> Hi George!
>>
>> I am not sure who you are talking with. Martin or me? I remind the
>> original topic is about using blast under parallel with LSF.
>> Martin's problem sounds like something offtopic.
>>
>> You have both sysadmin and bioinformatics experience so I would really
>> appreciate your help!
>>
>> I am working on a cluster so I must use LSF to get slots and I would
>> prefer using parallel also since it splits input automatically with
>> --recstart (which is quite nice:D otherwise I have to use another script
>> for that). I see I could do better with chunksize (I have 1 record at time
>> in my example) but that's a secondary problem now. First I have the
>> "lsb_launch(): Failed while waiting for tasks to finish." issue to solve.
>>
>> cheers,
>>
>> g
>>
>>
>>
>>
>> On Wed, Apr 15, 2015 at 7:44 PM, George Marselis <[email protected]>
>> wrote:
>>
>>> By the way, LSF and GNU parallel do almost the same thing. So using one
>>> of the two, defeats the purpose of using the other.
>>>
>>> In the same way, you could have used LSF to submit your jobs to LSF:
>>>
>>> bsub < script.sh
>>>
>>> where script.sh was
>>>
>>> bsub -J amoeba -q smalljobs  qfasta file1
>>> bsub -J amoeba -q smalljobs  qfasta file2
>>> ...
>>> bsub -J amoeba -q smalljobs  qfasta file2000
>>>
>>> On Wed, Apr 15, 2015 at 8:39 PM, George Marselis <[email protected]>
>>> wrote:
>>>
>>>> Hi. LSF/Openlava sysadmin in bioinformatics and parallel user here.
>>>>
>>>> I have seen this a couple more times: You are trying to use GNU
>>>> parallel to submit the jobs to all nodes.
>>>>
>>>> THat's now the way to do things: You should not submit jobs on *all*
>>>> your nodes. Please don't do that, as bsub was not designed to read large
>>>> chunks of jobs. bsub writes the jobs to your home directory, so if your
>>>> storage is not designed for a lot of writes, you are going to blow the
>>>> cluster out of the water.
>>>>
>>>> What you want to do is look up either:
>>>>
>>>> 1. bsub scripts
>>>> https://rc.fas.harvard.edu/resources/documentation/legacy-lsf/lsf-submit-an-lsf-job/
>>>>
>>>> or
>>>>
>>>> 2. job arrays
>>>> https://rc.fas.harvard.edu/resources/documentation/legacy-lsf/lsf-submitting-lots-of-short-jobs-job-arrays/
>>>>
>>>> Both bsub scripts and job arrays are useful to you: bsub scripts can be
>>>> submitted as part of a pipeline: you can program the output of the bsub
>>>> script from your pipeline and then submit it to bsub. So, instead of
>>>> submitting your job 2000 times as in
>>>>
>>>> bsub job0
>>>> bsub job1
>>>>
>>>> ....
>>>>
>>>> bsub job1999
>>>>
>>>> you just submit "bsub < scriptname" which contains 2000 lines which
>>>> describe your jobs and you are done. The rest is done by bsub/LSF
>>>>
>>>>
>>>> Now, if your jobs are similar in a way that you just increment counter
>>>> (as in most bioinformatics jobs), use arrays.
>>>>
>>>> bsub -J JOBNAME[0-1999], where JOBNAME is a string you would like to
>>>> name your job as, eg "fasta files alignment"
>>>>
>>>>
>>>> These techniques are useful because you can submit all 2000 jobs in
>>>> less than a second, you can do it from a single node and you will not have
>>>> to deal with a grumpy sysadmin or grumpy colleagues who cannot use the
>>>> cluster. Just make sure you use the appropriate queue.
>>>>
>>>> Let me know if you have any questions.
>>>>
>>>> Best Regards,
>>>>
>>>> George Marselis
>>>>
>>>> On Wed, Apr 15, 2015 at 6:48 PM, Martin d'Anjou <
>>>> [email protected]> wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>> Thanks for clarifying. I want to use GNU Parallel to bsub jobs. This
>>>>> way I can use GNU Parallel to throttle the number of jobs that are
>>>>> submitted to LSF, and it is easier than writing a loop.
>>>>>
>>>>> parallel -j 100 my_script [bsub options] ::: {1..2000}
>>>>>
>>>>> my_script (pseudo-code):
>>>>> #!/bin/bash
>>>>> ...
>>>>> bsub [bsub options] command ...
>>>>> post-process data
>>>>>
>>>>> This way I can submit jobs, say 100 at a time. When I submit all 2000
>>>>> jobs, it gets problematic and I start hitting limits with file 
>>>>> descriptors,
>>>>> etc.
>>>>>
>>>>> Thanks for sharing,
>>>>> Martin
>>>>>
>>>>>
>>>>> On 15-04-15 11:35 AM, Giuseppe Aprea wrote:
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>>  I am not sure I understand. As far as I can see, things work exactly
>>>>> the opposite way: you have an LSF script which launches GNU Parallel on
>>>>> some hosts provided by LSF. Something like:
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>> #!/bin/bash
>>>>>
>>>>>  #BSUB -J gnuParallel_blast_test      # Name of the job.
>>>>> #BSUB -o %J.out                              # Appends std output to
>>>>> file %J.out. (%J is the Job ID)
>>>>> #BSUB -e %J.err                               # Appends std error to
>>>>> file %J.err.
>>>>> #BSUB -q large                                 # Queue name.
>>>>> #BSUB -n 30                                      # Number of CPUs.
>>>>>
>>>>>  module load 4.8.3/ncbi/12.0.0
>>>>> module load 4.8.3/parallel/20150122
>>>>>
>>>>>  SLOTS=`cat ${LSB_DJOB_HOSTFILE} |wc -l`
>>>>>
>>>>>  SERVER=""
>>>>>
>>>>>  for i in `cat ${LSB_DJOB_HOSTFILE}| sort`
>>>>>  do
>>>>>  echo "/afs/enea.it/software/bin/blaunch.sh ${i}" >> servers
>>>>> done
>>>>>
>>>>>  cat absolute_path_to_sequences.fasta | parallel --no-notice -vv -j
>>>>> ${SLOTS} --slf servers --plain --recstart '>' -N 1 --pipe blastp -evalue
>>>>> 1e-05 -outfmt 6 -db absolute_path_to_db_file -query - -out
>>>>> absolute_path_to_result_file_{%}
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>>  LSF is the one which gives you the execution hosts so if you are
>>>>> launching bsub from GNU parallel how do you know how to set the --slf
>>>>> option?
>>>>>
>>>>>
>>>>>  g
>>>>>
>>>>>
>>>>>
>>>>>   On Wed, Apr 15, 2015 at 4:24 PM, Martin d'Anjou <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> On 15-04-15 09:34 AM, Giuseppe Aprea wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I would like to ask you, please, some help in using parallel with
>>>>>>> blast alignment software.
>>>>>>>
>>>>>>>
>>>>>>> I am trying to use GNU parallel v. 20150122 with blast for a very
>>>>>>> large sequences alignment. I am using Parallel on a cluster which uses 
>>>>>>> LSF
>>>>>>> as queue system.
>>>>>>>
>>>>>>
>>>>>>  Hello Giuseppe,
>>>>>>
>>>>>> I am an avid LSF user, and I want to use GNU Parallel to dispatch
>>>>>> jobs to LSF. Could you please explain a little bit to me how GNU Parallel
>>>>>> works with LSF? I do not see it in the on-line tutorials. For example, I
>>>>>> would like to understand how to pass "bsub" options like -oo, -q
>>>>>> queue_name, etc. to LSF from GNU Parallel.
>>>>>>
>>>>>> Thanks,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to