[galaxy-dev] dynamically send jobs to second cluster on high load
Hi, The admin pages state that it is possible to specify multiple clusters in the universe file. Currently, we are investigating if we can couple the university HPC platform to galaxy, to handle usage peaks. It would be ideal if the job manager would check the load of the dedicated cluster (eg queue length) and send jobs to the second cluster when load is above a threshold. Does such an approach exists already, or will it become available in the near future? As far as I understand, it is now only possible to specify which jobs run on which cluster, without dynamic switching? Best regards, Geert -- Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail: geert.vandewe...@ua.ac.be http://ua.ac.be/cognitivegenetics http://www.linkedin.com/pub/geert-vandeweyer/26/457/726 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] The user creation and login script can be injected with executable javascript in Galaxy
Hanfei, I'd be happy to take a look at the report and share it with the rest of the team if you'd like to send it directly to me. Regarding SSL, this is definitely something that you can set up for your own instance, see the documentation for configuring proxies on the wiki http://wiki.g2.bx.psu.edu/Admin/Config/Performance/nginx%20Proxy. Thanks! -Dannon On Sep 24, 2012, at 12:01 AM, Hanfei Sun ad9...@gmail.com wrote: Hello Galaxy-team, A galaxy instance is being hold on our server. But last week, an expert in security makes some tests on our server. He warned us that the user creation and login script can be injected with executable javascript in Galaxy, which may make our server vulnerable. He gives us a report of 3 pages (other issues including Non-SSL Password and cookie of Galaxy). We don't know whether it's serious and whether we need to fix these issues immediately. Is Galaxy going to update for issues? Or we need to modify them ourselves? Any suggestion is appreciated. Thanks! -- Hanfei Sun Sent with Sparrow ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Error when running cleanup_datasets.py
Hello, I am trying to run the cleanup scripts on my local installation but get stuck when trying to run the following: ./scripts/cleanup_datasets/cleanup_datasets.py universe_wsgi.ini -d 10 -5 -r Deleting library dataset id 7225 Traceback (most recent call last): File ./scripts/cleanup_datasets/cleanup_datasets.py, line 524, in module if __name__ == __main__: main() File ./scripts/cleanup_datasets/cleanup_datasets.py, line 124, in main purge_folders( app, cutoff_time, options.remove_from_disk, info_only = options.info_only, force_retry = options.force_retry ) File ./scripts/cleanup_datasets/cleanup_datasets.py, line 247, in purge_folders _purge_folder( folder, app, remove_from_disk, info_only = info_only ) File ./scripts/cleanup_datasets/cleanup_datasets.py, line 497, in _purge_folder _purge_folder( sub_folder, app, remove_from_disk, info_only = info_only ) File ./scripts/cleanup_datasets/cleanup_datasets.py, line 497, in _purge_folder _purge_folder( sub_folder, app, remove_from_disk, info_only = info_only ) File ./scripts/cleanup_datasets/cleanup_datasets.py, line 495, in _purge_folder _purge_dataset_instance( ldda, app, remove_from_disk, info_only = info_only ) #mark a DatasetInstance as deleted, clear associated files, and mark the Dataset as deleted if it is deletable File ./scripts/cleanup_datasets/cleanup_datasets.py, line 376, in _purge_dataset_instance ( dataset_instance.__class__.__name__, dataset_instance.id, dataset_instance.dataset.id ) AttributeError: 'NoneType' object has no attribute 'id' Any help would be much appreciated. Thanks, Liisa ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Can't edit Galaxy Workflow _ElementInterface instance has no attribute 'render'
Hello, After updating to the Sept. 07 distribution I am having problems editing an existing workflow. Server error URL: http:galaxy_url/workflow/load_workflow?id=ba751ee0539fff04_=1348501448807 Module paste.exceptions.errormiddleware:143 in __call__ app_iter = self.application(environ, start_response) Module paste.debug.prints:98 in __call__ environ, self.app) Module paste.wsgilib:539 in intercept_output app_iter = application(environ, replacement_start_response) Module paste.recursive:80 in __call__ return self.application(environ, start_response) Module paste.httpexceptions:632 in __call__ return self.application(environ, start_response) Module galaxy.web.framework.base:160 in __call__ body = method( trans, **kwargs ) Module galaxy.web.framework:69 in decorator return simplejson.dumps( func( self, trans, *args, **kwargs ) ) Module galaxy.web.controllers.workflow:735 in load_workflow 'tooltip': module.get_tooltip( static_path=url_for( '/static' ) ), Module galaxy.workflow.modules:262 in get_tooltip return self.tool.help.render( static_path=static_path ) AttributeError: _ElementInterface instance has no attribute 'render' Any help would be much appreciated. Thanks in advance, Liisa ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] python egg cache exists error
For Test/Main, I have the user's ~/.bash_profile set $PYTHON_EGG_CACHE on a per-node basis. This could also be done per-node and per-pty to ensure uniqueness per job. --nate On Sep 18, 2012, at 11:24 AM, James Taylor wrote: Interesting. If I'm reading this correctly the problem is happening inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so on install [fetch_eggs] time not run time which would avoid this). If so this would seem to be a locking bug in pkg_resources. Dannon, we could put a guard around the imports in extract_dataset_part.py as an (overly aggressive and hacky) fix. -- jt On Tue, Sep 18, 2012 at 10:37 AM, Jorrit Boekel jorrit.boe...@scilifelab.se wrote: - which lead to unzipping .so libraries from python eggs into the nodes' /home/galaxy/.python-eggs - this runs into lib/pkg_resources.py and its _bypass_ensure_directory method that creates the temporary dir for the egg unzip - since there are 8 processes on the node, sometimes this method tries to mkdir a directory that was just made by the previous process after the isdir. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] How to rotate Galaxy log file
On Sep 19, 2012, at 9:50 AM, Jennifer Jackson wrote: repost to galaxy-dev On 9/7/12 6:39 PM, Lukasz Lacinski wrote: Dear All, I use an init script that comes with Galaxy in the contrib/ subdirectory to start Galaxy. The log file --log-file /home/galaxy/galaxy.log specified in the script grows really quickly. How to logrotate the file? Hi Lukasz, I'd suggest using whatever log rotation utility is provided by your OS. You'll need to restart the Galaxy process to begin writing to the new log once the old one has been rotated. --nate Thanks, Lukasz ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://galaxyproject.org ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Automatic installation of third party dependancies
Hi Lance, On Sep 21, 2012, at 6:04 PM, Lance Parsons wrote: OK, I was able to get a new version installed. It seems there are two issues: 1) New revisions with the same version ionvalidate previous revisions. This means that Galaxy servers with the old, and now invalid, revisions are not able to update the tool (nor install it again). I'm not quite sure what you're stating here. Do the following tool shed wiki page clarify the behavior you are seeing? http://wiki.g2.bx.psu.edu/ToolShedRepositoryFeatures#Pushing_changes_to_a_repository_using_hg_from_the_command_line http://wiki.g2.bx.psu.edu/RepositoryRevisions#Installable_repository_changeset_revisions 2) Pushes from Mercurial (even version 2.3.3) do not seem to trigger metadata refreshes in the tool shed, however, uploads of tar.gz files do. I am not able to reproduce this behavior. In my environment, metadata is always automatically generated for new changesets I push to my local tool shed (or the test tool shed) from the command line. What is the result of typing the following in the environment from which you are pushing changes to the tool shed? $hg --version You should see something like the following, showing that you are running at least hg version 2.2.3. gvk:/tmp/repos/convert_chars gvk$ hg --version Mercurial Distributed SCM (version 2.2.3) (see http://mercurial.selenic.com for more information) Copyright (C) 2005-2012 Matt Mackall and others This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Hope this helps. Lance Lance Parsons wrote: I've run into this issue again, and I'm having a hard time working around it. However, I have confirmed that at least some updates to a tool in the tool shed will invalidate previously valid revisions and thus prevent users from installing or updating the tool at all. For example, push version 0.1 of the tool and create a valid revision 1:xx. Then install the tool in galaxy. Make a small change (say to tool_dependencies.xml) and push a new revision (but keep the tool version the same), now at revision 2:xxx. The tool shed will show 2:xx as the only valid revision to install, but the galaxy system with revision 1:xx will be stuck, unable to get upgrades (Server Error described previously). I'm trying to work around this now with my htseq-count tool, but so far no luck. I've created a few spurious revisions in the attempt, and I think now I may just try bumping the version (already did to no avail, toolshed still thinks it's the same) and uploading a tar file. That seems to more reliably parse metadata. Will let you know what, if anything, works. Thanks. Lance Greg Von Kuster wrote: Hello Lance, I've just committed a fix for getting updates to installed tool shed repositories in change set 7713:23107188eab8, which is currently available only in the Galaxy central repository. However, my fix will probably not correct the issue you're describing, and I'm still not able to reproduce this behavior. See my inline comments... On Sep 13, 2012, at 4:41 PM, Lance Parsons wrote: Actually, I think that is exactly the issue. I DO have 3:f7a5b54a8d4f installed. I've run into a related issue before, but didn't fully understand it. I believe what happened was: 1) I pushed revision 3:f7a5b54a8d4f to the tool shed which contained the first revision of version 0.2 of the htseq-count tool. 2) I installed the htseq-count tool from the tool shed, getting revision 3:f7a5b54a8d4f 3) I pushed an update to version 0.2 of the htseq-count tool. The only changes were to tool-dependencies so I thought it would be safe to leave the version number alone (perhaps this is problem?) You are correct in stating that the tool version number should not change just because you've added a tool_dependencies.xml file. This is definitely not causing the behavior you're describing. 4) I attempted to get updates and ran into the issue I described. I also ran into this (I believe it was with freebayes, but not sure) when I removed (uninstalled) a particular revision of a tool. Then the tool was updated. I went to install and and it said that I already had a previous revision installed and should install that. However, I couldn't since the tool shed won't allow installation of old revisions of the same version of a tool. The following section of the tool shed wiki should provide the details about why you are seeing this behavior. Keep in mind that you will only get certain updates to installed repositories from the tool shed. This behavior enables updates to installed tool versions. To get a completely new version of an installed tool (if one exists), you need to install a new (different) changeset revision from the tool shed repository.
[galaxy-dev] When will the API allow setting of parameters (not inputs) from the API
Hi, One of the biggest hurdles for the implementation in our institute is the inability of Galaxy API to set parameters at run time. You can only seem to set inputs, but not parameters... Is there any ETA on when this will be available? Is this even a priority? Thanks! Regards, Thon de Boer, Ph.D. Bioinformatics Guru +1-650-799-6839 thondeb...@me.com LinkedIn Profile ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] dynamically send jobs to second cluster on high load
Hello Geert, I don't believe any such functionality is available out of the box, but I am confident clever use of dynamic job runners (http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-June/010080.html) could solve this problem. One approach would be to maybe move all of your job runners out of galaxy:tool_runners maybe to a new section called galaxy:tool_runners_local and then create another set of runners for your HPC resource (maybe galaxy:tool_runners_hpc). Next set your default_cluster_job_runner to dynamic:///python/default_runner and create a python function called default_runner in lib/galaxy/jobs/rules/200_runners.py. The outline of this file might be something like this: from ConfigParser import ConfigParser def default_runner(tool_id): runner = None if _local_queue_busy(): runner = _get_runner(galaxy:tool_runners_local, tool_id) else: runner = _get_runner(galaxy:tool_runners_hpc, tool_id) if not runner: runner = local:// # Or whatever default behavior you want. return runner def _local_queue_busy(): # TODO: check local queue, would need to know more... def _get_runner(runner_section, tool_id): universe_config_file = universe_wsgi.ini parser = ConfigParser() parser.read(universe_config_file) job_runner = None if parser.has_option(runner_section, tool_id): job_runner = parser.get(runner_section, tool_id) return job_runner You could tweak the logic here to do stuff like only submit certain kinds of jobs to the HPC resource or specify different default runners for each location. Hopefully this is helpful. If you want more help defining this file I could fill in the details if I knew more precisely what behavior you wanted for each queue and what the command line to determine if the dedicated Galaxy resource is busy (or maybe just what queue manager you are using if any). Let me know if you go ahead and get this working, I am eager to hear success stories. -John John Chilton Senior Software Developer University of Minnesota Supercomputing Institute Office: 612-625-0917 Cell: 612-226-9223 Bitbucket: https://bitbucket.org/jmchilton Github: https://github.com/jmchilton Web: http://jmchilton.net On Mon, Sep 24, 2012 at 3:55 AM, Geert Vandeweyer geert.vandewey...@ua.ac.be wrote: Hi, The admin pages state that it is possible to specify multiple clusters in the universe file. Currently, we are investigating if we can couple the university HPC platform to galaxy, to handle usage peaks. It would be ideal if the job manager would check the load of the dedicated cluster (eg queue length) and send jobs to the second cluster when load is above a threshold. Does such an approach exists already, or will it become available in the near future? As far as I understand, it is now only possible to specify which jobs run on which cluster, without dynamic switching? Best regards, Geert -- Geert Vandeweyer, Ph.D. Department of Medical Genetics University of Antwerp Prins Boudewijnlaan 43 2650 Edegem Belgium Tel: +32 (0)3 275 97 56 E-mail: geert.vandewe...@ua.ac.be http://ua.ac.be/cognitivegenetics http://www.linkedin.com/pub/geert-vandeweyer/26/457/726 ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/