>Sorry this is old but I tried recompiling torque after setting the
>NCONNECTS to 20 and the issue's still there.
>
>But there's more: It doesn't affect only flagstat but other non-linear
>workflows. One of the two jobs that are submitted when their "father"
>stopped running triggers the same error:
>PBS error 15033: No free connections
>
>And the best part is that it works fine sometimes. But when it crashes,
>it's always the same job that crashes.
>
>Does anyone have a clue?
>
>Cheers,
>L-A


I was getting the same behavior as you on asynchronous workflows on a
multicore computer that is acting as both head and compute node for the
torque system. Even after recompiling with a higher NCONNECTS I was getting
the same error. I suspect that this is due to galaxy opening up multiple
connections to check the status of currently running jobs. Because there can
be many status checks in an asynchronous workflow the pbs system is randomly
busy depending on when the job submission comes in. To deal with this I
modified the lib/galaxy/jobs/runners/pbs.py script to make multiple attempts
at submitting in the following way:

@@ -286,6 +286,12 @@ class PBSJobRunner( BaseJobRunner ):
         log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) )
         log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) )
         job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name,
None)
+       ##Modified to give ten tries for qsubbing a job
+       num_try=0
+       while(not job_id and num_try<10):
+               job_id = pbs.pbs_submit(c, job_attrs, job_file,
pbs_queue_name, None)
+               num_try+=1
+
         pbs.pbs_disconnect(c)

         # check to see if it submitted


I haven't had any problems since.

Cheers,
Andrew


>* Louise-Amélie Schmitt wrote:*>>* Hello everyone*>>**>>* I observed an issue 
>when flagstat is incorporated in a workflow in which*>>* the BAM file it works 
>on is also used by another program (generate*>>* pileup for instance) and is 
>NOT the input dataset (generated by sam to*>>* bam within the 
>workflow).*>>**>>* I tested it with the local job runner and with TORQUE (with 
>the pbs*>>* scheduler and Maui).*>>**>>* - With the local job runner, it works 
>just fine*>>**>>* - With TORQUE I get the following error message:*>>* 
>pbs_submit failed, PBS error 15033: No free connections*>* Hi,*>**>* This can 
>most likely be fixed by increasing the value of NCONNECTS in*>* the TORQUE 
>source, in src/include/libpbs.h, and recompiling on your*>* TORQUE server.  I 
>haven't seen a problem after increasing the value to*>* 20.*>**>* --nate*>**
>>* Surprisingly, other non-linear workflows work fine. I only observed 
>>this*>>* error with flagstat. Moreover, when flagstat is in a linear 
>>workflow, it*>>* works fine too. Ad if it is non-linear but the input dataset 
>>is the bam*>>* file flagstat works on, it works fine too.*>>**>>* Please find 
>>attached one of the test workflow where I found the error.*>>* The input 
>>dataset is a sam file.*>>**>>* Any clue?*>>**>>* Cheers,*>>* LA*>>* {*>>*     
>> "a_galaxy_workflow": "true",*>>*      "annotation": "to see if it fails if 
>>not forked",*>>*      "format-version": "0.1",*>>*      "name": "test 
>>flagstat",*>>*      "steps": {*>>*          "0": {*>>*              
>>"annotation": "",*>>*              "id": 0,*>>*              
>>"input_connections": {},*>>*              "inputs": [*>>*                  
>>{*>>*                      "description": "",*>>*                      
>>"name": "Input Dataset"*>>*                  }*>>*              ],*>>*        
>>      "name": "Input dataset",*>>*              "outputs": [],*>>*            
>>  "position": {*>>*                  "left": 200,*>>*                  "top": 
>>200*>>*              },*>>*              "tool_errors": null,*>>*             
>> "tool_id": null,*>>*              "tool_state": "{\"name\": \"Input 
>>Dataset\"}",*>>*              "tool_version": null,*>>*              "type": 
>>"data_input",*>>*              "user_outputs": []*>>*          },*>>*         
>> "1": {*>>*              "annotation": "",*>>*              "id": 1,*>>*      
>>        "input_connections": {*>>*                  "source|input1": {*>>*    
>>                  "id": 0,*>>*                      "output_name": 
>>"output"*>>*                  }*>>*              },*>>*              
>>"inputs": [],*>>*              "name": "SAM-to-BAM",*>>*              
>>"outputs": [*>>*                  {*>>*                      "name": 
>>"output1",*>>*                      "type": "bam"*>>*                  }*>>*  
>>            ],*>>*              "position": {*>>*                  "left": 
>>274.5,*>>*                  "top": 307*>>*              },*>>*              
>>"tool_errors": null,*>>*              "tool_id": "sam_to_bam",*>>*            
>>  "tool_state": "{\"source\": \"{\\\"index_source\\\": \\\"cached\\\", 
>>\\\"input1\\\": null, \\\"__current_case__\\\": 0}\", \"__page__\": 0}",*>>*  
>>            "tool_version": "1.1.1",*>>*              "type": "tool",*>>*     
>>         "user_outputs": []*>>*          },*>>*          "2": {*>>*           
>>   "annotation": "",*>>*              "id": 2,*>>*              
>>"input_connections": {*>>*                  "input1": {*>>*                   
>>   "id": 1,*>>*                      "output_name": "output1"*>>*             
>>     }*>>*              },*>>*              "inputs": [],*>>*              
>>"name": "flagstat",*>>*              "outputs": [*>>*                  {*>>*  
>>                    "name": "output1",*>>*                      "type": 
>>"txt"*>>*                  }*>>*              ],*>>*              "position": 
>>{*>>*                  "left": 396.5,*>>*                  "top": 445*>>*     
>>         },*>>*              "tool_errors": null,*>>*              "tool_id": 
>>"samtools_flagstat",*>>*              "tool_state": "{\"__page__\": 0, 
>>\"input1\": \"null\"}",*>>*              "tool_version": "1.0.0",*>>*         
>>     "type": "tool",*>>*              "user_outputs": []*>>*          },*>>*  
>>        "3": {*>>*              "annotation": "",*>>*              "id": 
>>3,*>>*              "input_connections": {*>>*                  
>>"refOrHistory|input1": {*>>*                      "id": 1,*>>*                
>>      "output_name": "output1"*>>*                  }*>>*              },*>>* 
>>             "inputs": [],*>>*              "name": "Generate pileup",*>>*    
>>          "outputs": [*>>*                  {*>>*                      
>>"name": "output1",*>>*                      "type": "tabular"*>>*             
>>     }*>>*              ],*>>*              "position": {*>>*                 
>> "left": 519,*>>*                  "top": 340*>>*              },*>>*         
>>     "tool_errors": null,*>>*              "tool_id": "sam_pileup",*>>*       
>>       "tool_state": "{\"__page__\": 0, \"c\": \"{\\\"consensus\\\": 
>>\\\"no\\\", \\\"__current_case__\\\": 0}\", \"indels\": \"\\\"no\\\"\", 
>>\"refOrHistory\": \"{\\\"input1\\\": null, \\\"reference\\\": 
>>\\\"indexed\\\", \\\"__current_case__\\\": 0}\", \"lastCol\": \"\\\"no\\\"\", 
>>\"mapCap\": \"\\\"60\\\"\"}",*>>*              "tool_version": "1.1.1",*>>*   
>>           "type": "tool",*>>*              "user_outputs": []*>>*          
>>}*>>*      }*>>* }*
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to