Hi again,

I have looked into this matter a little bit more, and it looks like this is happening:

- tasked job is split
- tasks commands are sent to workers (I am running 8-core high cpu extra large workers on EC2)
- per task, worker runs env.sh for the respective tool
- per task, worker runs scripts/extract_dataset_part.py
- this scripts issues import statements (ones forsimplejson and galaxy.model.mapping have caused me problems) - which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir.

That last point is my guessing. I don't really know how to solve this in a non-hackish way, so until someone finds out, I may use reading from a 'eggs_extracted.txt' file to determine if the eggs have been extracted. And locking the file when writing to it of course.

cheers,
jorrit

On 09/14/2012 10:57 AM, Jorrit Boekel wrote:
Dear list,

I am running galaxy-dist on Amazon EC2 through Cloudman, and am using the enable_tasked_jobs to run jobs in parallel. Yes, I know it's not recommended in production. My jobs usually get split in 72 parts, and sometimes (but not always, maybe in 30-50% of cases), errors are returned concerning the python egg cache, usually:

[Errno 17] File exists: '/home/galaxy/.python-eggs'

or something like

[Errno 17] File exists: '/home/galaxy/.python-eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg-tmp'

The errors arise AFAIK from when scripts/extract_dataset_part.py is run. I am guessing that the tmp python egg dir is created for every task of the mentioned 72, that they sometimes coincide and that this leads to an error.

I would like to solve this problem, but before doing so, I'd like to know if someone else has already fixed it in a galaxy-central changeset.

cheers,
jorrit
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to