On Thu, Nov 15, 2012 at 11:21 AM, Peter Cock <p.j.a.c...@googlemail.com> wrote:
> On Thu, Nov 15, 2012 at 10:12 AM, Peter Cock <p.j.a.c...@googlemail.com> 
> wrote:
>> On Thu, Nov 15, 2012 at 10:06 AM, Peter Cock <p.j.a.c...@googlemail.com> 
>> wrote:
>>> Hi all,
>>>
>>> Something has changed in the job handling, and in a bad way. On my
>>> development machine submitting jobs to the cluster didn't seem to be
>>> working anymore (never sent to SGE). I killed Galaxy and restarted:
>>> ...
>>> (segmentation fault)
>>
>> Looking into the problem with submitting the jobs, there seems to be
>> a problem with task splitting somehow recursing - the same file is
>> split four times, the filename getting longer and longer:
>
> Turning off task splitting I could run the same job OK on SGE.
>
> So, the good news is the problems seem to be specific to the
> task splitting code. Also I have reproduced the segmentation
> fault when restarting Galaxy (after stopping Galaxy with one
> of these broken jobs).
>
> Starting server in PID 17996.
> serving on http://127.0.0.1:8081
> galaxy.jobs.runners.drmaa ERROR 2012-11-15 11:07:27,762 (327/None)
> Unable to check job status
> Traceback (most recent call last):
>   File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py",
> line 296, in check_watched_items
>     state = self.ds.jobStatus( job_id )
>   File 
> "/mnt/galaxy/galaxy-central/eggs/drmaa-0.4b3-py2.6.egg/drmaa/__init__.py",
> line 522, in jobStatus
>     _h.c(_w.drmaa_job_ps, jobName, _ct.byref(status))
>   File 
> "/mnt/galaxy/galaxy-central/eggs/drmaa-0.4b3-py2.6.egg/drmaa/helpers.py",
> line 213, in c
>     return f(*(args + (error_buffer, sizeof(error_buffer))))
>   File 
> "/mnt/galaxy/galaxy-central/eggs/drmaa-0.4b3-py2.6.egg/drmaa/errors.py",
> line 90, in error_check
>     raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value))
> InvalidArgumentException: code 4: Job id, "None", is not a valid job id
> galaxy.jobs.runners.drmaa WARNING 2012-11-15 11:07:27,764 (327/None)
> job will now be errored
> ./run.sh: line 86: 17996 Segmentation fault      (core dumped) python
> ./scripts/paster.py serve universe_wsgi.ini $@
>
> The problem is the job_id variable is "None" (note this is a string,
> not the Python special object None) in check_watched_items().
>
> Peter

Is anyone else seeing this? I am wary of applying the update to our
production Galaxy until I know how to resolve this (other than just
be disabling task splitting).

Thanks,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to