Re: [galaxy-dev] python job manager

2011-03-16 Thread Ry4an Brase
On Wed, Mar 16, 2011 at 10:46:54AM -0400, James Lindsay wrote:
> Hi,
> I run galaxy on a large SMP university machine. The machine is used by some 
> folks for command line work, and others via galaxy. I was wondering if 
> anyone had integrated into galaxy a job manager that monitors CPU load 
> averages, and only runs new jobs when cpu resources are available?

James, you could achieve this using the galaxy cluster configurations.
Even with a single (SMP) machine in use there's value in creating (for
example) a torque queue on that machine and using Galaxy's torque
support to have it submit jobs to that queue.  Your non-Galaxy command
line users can then also use the 'qsub' command to launch their jobs,
and Torque will be able to balance resources across them according to
your preferences.

Having your galaxy jobs use the pbs (torque) job runner has the
additional benefit of being able to restart galaxy without the jobs
losing their parent process and dying.

https://bitbucket.org/galaxy/galaxy-central/wiki/Config/ProductionServer

https://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] NoneType dereference on the jobs view

2011-03-10 Thread Ry4an Brase
Intermittently, and always during periods of high load we'll get a 500
Server error from the Admin 'Manage Jobs' list.  In the logs the
stacktrace looks like: http://paste.pocoo.org/show/351374/

Attached is the patch JJ provided to work around jobs without histories,
but I thought I'd bring it up here too in case either others are seeing
it or someone knows a root cause.

Thanks!

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
# HG changeset patch
# User JJ 
# Date 1299791693 21600
# Node ID c9e807155b6d4c43e609e7bbae42060f8dc32fa0
# Parent  c8c7eb5ec4200201c66a833abc3aa2c03e8e
Check for NoneType history.

Tried to look at the jobs listing for galaxy and got a server error:

Error - : 'NoneType' object has no attribute 'user'
URL: https://galaxy.msi.umn.edu/admin/jobs
...
File
'/website/galaxy.msi.umn.edu/PRODUCTION/database/compiled_templates/admin/jobs.mako.p
y', line 84 in render_body
  if job.history.user:
AttributeError: 'NoneType' object has no attribute 'user'


Evidently, there is a job without an associated history.  So, I added a check for
job.history:

diff -r c8c7eb5ec420 -r c9e807155b6d templates/admin/jobs.mako
--- a/templates/admin/jobs.mako	Fri Feb 18 14:53:35 2011 -0500
+++ b/templates/admin/jobs.mako	Thu Mar 10 15:14:53 2011 -0600
@@ -47,7 +47,7 @@
 %endif
 
 ${job.id}
-%if job.history.user:
+%if job.history and job.history.user:
 ${job.history.user.email}
 %else:
 anonymous
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Galaxy job runner queue management questions

2011-03-10 Thread Ry4an Brase
As use of our Galaxy installation is picking up, we're getting a lot of
requests for greater fairness and transparency in the Galaxy job runner
area.

As I understand things the primary tool Galaxy gives us to affect
processing order and wait times with our torque-based setup is the
ability to map specific tools to varying queues or to keep them on a
local-runner.

On one end of the spectrum I could see a simple division of
small/fast/light jobs on local and big/heavy/slow job on a single
cluster queue.  On the other extreme one could set up a queue per tool
and use sophisticated queue management stuff on the torque side of
things to balance capacity across tools, users, expected processing
time, etc.

How are other sites handling this?

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] divide fq into 2

2011-03-09 Thread Ry4an Brase
On Tue, Mar 08, 2011 at 11:15:44PM -0500, Musa A. Hassan wrote:
> Yes I can't get the file into galaxy at all. Am uploading from a file
> path. the file is  35mb.

When you say "uploading from a filepath" are you using the
administrator-only functionality explained here:

https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFiles

_not_ the 'Get Data' -> 'Upload File' selection from the 'Tools' menu?

People work with files _much_ larger than that in Galaxy all the time.

> Musa
> 
> From: Ry4an Brase [ry4an+gal...@msi.umn.edu]
> Sent: Tuesday, March 08, 2011 11:06 PM
> To: Musa A. Hassan
> Subject: Re: [galaxy-dev] divide fq into 2
> 
> On Tue, Mar 08, 2011 at 10:44:51PM -0500, Musa A. Hassan wrote:
> > Hi Ry4an,
> >
> > I'd like to do this in galaxy, but the problem is it wont load into
> > galaxy. As for using split, the file generated from this returns  a
> > length mismatch in say Tophat, maybe in the process of splitting the
> > file some changes happen to the format.
> 
> So you can't get the file into galaxy at all?  Are you trying to upload
> it through your browser (suitable only for non-huge files) or are you
> using 'upload from file path'?  How big (bytes) is the file.
> 
> Also, you should try to keep your replies on the mailing list so that
> other searching in the future find the same help.
> 
> --
> Ry4an Brase 612-626-6575
> Software Developer  Application Development
> University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Pass files as url

2011-03-08 Thread Ry4an Brase
On Tue, Mar 08, 2011 at 03:46:13PM -0500, Bonci, Timothy Daniel wrote:
> I have an interactive applet that visualizes data produced by Galaxy.
> The applet, being run client side, does not have access to the file
> system of the server, so passing history files by reference (path)
> won't work.  All the files are available directly to the browser
> through a url, but I can't figure out a way to get that url.
> Alternatively, If anyone could help me find the history.htm file
> internally I could have the program that preps the applet parse that
> for the urls.

Tom, there should be an easier way for you to get the URLs.  If you
configure your applet as an ExternalDisplayApplication:
https://bitbucket.org/galaxy/galaxy-central/wiki/ExternalDisplayApplications/Tutorial

you can define the tool using XML get a public URL to the data file
passed directly to the URL invoking your display/applet.

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] divide fq into 2

2011-03-08 Thread Ry4an Brase
On Tue, Mar 08, 2011 at 03:43:20PM -0500, Musa A. Hassan wrote:
> Hi All,
> Is there anyone out there who know how I can divide an fq file containing 
> illumina short reads randomly into 2 small files contaning approximately 
> equal number of reads? I have a huge fq from the illumina high-seq platform, 
> unfortunately, this file is huge and is causing all sorts of problems and I'd 
> like to divide into to equal sizes(based on number of reads).

I'm assuming you mean "in galaxy", right?  If so, check out the entries
in 'Text Manipulation'.  Using 'select first' and 'select last' you can
turn one dataset into two datasets each half the size.

If instead you mean on the Unix command line, use the tool 'split' or
'head' and 'tail'.

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] upload large data file

2011-03-04 Thread Ry4an Brase
On Fri, Mar 04, 2011 at 04:28:20PM +, Yanji Xu wrote:
> Dear Sir/Madam,
> 
> I installed galaxy in my local server, then I tried to upload a 4.7 Gb
> fastq file into galaxy, but failed.  Below is the error message.
> 
> OverflowError: signed integer is greater than maximum
> 
> How could I upload large data files into galaxy and process the data?

Use either the Upload from filepath mechanism available for data
libraries (
https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFiles)
which has you copy the file to the server in advance and then import it,
or setup the Upload via FTP functionality (
https://bitbucket.org/galaxy/galaxy-central/wiki/UploadViaFTP ).

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Amazon cloud formation for galaxy

2011-02-25 Thread Ry4an Brase
Today's release of amazon cloud formation has to make the Galaxy Cloud
stuff a bit easier:
http://aws.typepad.com/aws/2011/02/cloudformation-create-your-aws-stack-from-a-recipe.html

In theory with a description file like this: 

https://s3.amazonaws.com/cloudformation-templates/CloudFormationSample_WordPress.template

Once can define an entirely cluster to bring up from custom .amis with a
single click.  Exciting.

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Configuring remote galaxy job runners

2011-02-24 Thread Ry4an Brase
On Wed, Feb 23, 2011 at 10:58:18PM -0500, Nate Coraor wrote:
> It looks like your datatypes_conf.xml is out of date.  Have a look at
> the differences from datatypes_conf.xml.sample.

Ooof, I've been merging in regularly, but in a strictly additive
fashion.  

I think my disconnect came in not understanding why those warnings would
abort for a remote runner but not for the local runner, which I now
realize is because the local runner isn't re-initializing the entire
galaxy stack and re-parsing the config file, so on local I see those
warnings at startup time where they're not blocking a job.

> > Is it possible it's just the output on STDERR causing the job to fail,
> > and if so how do I shut that up when I'm running through qsub (so
> > redirect to /dev/null isn't quite right)?
> 
> Yes, anything output to STDERR will be considered a failure.  There is a
> ticket in Bitbucket for this (actually, you commented on it 8 months ago ;)
> 
>   
> https://bitbucket.org/galaxy/galaxy-central/issue/325/allow-tool-authors-to-decide-whether-to

Heh, not just commented but had the call to do a 'take' on it, though
the good folks who have done more with it since have taken it back.

Thanks!

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Configuring remote galaxy job runners

2011-02-23 Thread Ry4an Brase
I'm working on getting more of our jobs offloaded to other machines, and
I'm getting job failures I'm not able to debug.  When running a simple
tool like 'cut', submitted over qsub, I'm getting STDERR output like
this:

WARNING:galaxy.datatypes.registry:Error loading datatype "binseq.zip", 
problem: 'module' object has no attribute 'Binseq'
WARNING:galaxy.datatypes.registry:Error loading datatype "fastqc", problem: 
'module' object has no attribute 'fastqc'
WARNING:galaxy.datatypes.registry:Error loading datatype "ssaha2_index", 
problem: 'module' object has no attribute 'SSAHA2Index'

and nothing on STDOUT.  I'm doing the "Unified Method" as described here
https://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster with
paths to datafiles and executables the same on the web runner and torque
worker systems.  I can successfully qsub trivial jobs ("ls") from the
web runner machine and see them executed remotely.  The web runner's
galaxy log doesn't show anything out of the norm:

galaxy.jobs INFO 2011-02-23 16:23:55,400 JobWrapper prepare 4019 Cut1 Ry4an 
perl /website/galaxy.msi.umn.edu/PRODUCTION/tools/filters/cutWrapper.pl 
/galaxy/PRODUCTION/database/files/019/dataset_19366.dat "c1,c2" T 
/galaxy/PRODUCTION/database/files/019/dataset_19823.dat
galaxy.jobs INFO 2011-02-23 16:24:04,559 JobWrapper state   4019 Cut1 running 
Ry4an
128.101.189.29 - - [23/Feb/2011:16:24:06 -0500] "POST 
/root/history_item_updates HTTP/1.1" 200 - "https://galaxy.msi.umn.edu/history"; 
"Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.13 (KHTML, like 
Gecko) Chrome/9.0.597.84 Safari/534.13"
galaxy.jobs INFO 2011-02-23 16:24:07,771 JobWrapper finish  4019 Cut1 error 
Ry4an
galaxy.jobs INFO 2011-02-23 16:24:07,880 JobWrapper done4019 Cut1 error 
Ry4an

I didn't see it in the Cluster config but I added the /lib
to the $PYTHONPATH just in case, but no luck.

Is it possible it's just the output on STDERR causing the job to fail,
and if so how do I shut that up when I'm running through qsub (so
redirect to /dev/null isn't quite right)?

Thanks,


-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
To manage your subscriptions to this and other Galaxy lists, please use the 
interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] histogram tool not working

2011-02-01 Thread Ry4an Brase
On Tue, Feb 01, 2011 at 08:51:25PM +, Peter wrote:
> On Tue, Feb 1, 2011 at 8:06 PM, David Hoover  wrote:
> > I just updated to the most recent version of Galaxy (hg pull -u), and now
> > the error is different:
> > An error occurred running this job: Error in hist.default(list(8, 6, 14, 8,
> > 10, 3, 8, 6, 3, 12, 12, 8, 8, :
> > 'x' must be numeric
> > What gives?
> > David
> 
> Hi,
> 
> R gives the error message 'x' must be numeric, so relevant questions
> are what version of R do you have, what version or rpy, and what
> version of Python (since Galaxy's tools tends to invoke R from
> Python via the ryp library - certainly the histogram tool does).

We had the same problem when we first installed Galaxy back in March:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-March/002199.html

Looking through both our ticketing system and our local hg commits I'm
not finding what JJ did to fix it, so I'm cc-ing him on this.

JJ, do you recall?

(Lately similar problems, which all seem to be rpy incorrectly inferring the
type of its arguments, have caused us to rewrite many of the rpy using
tools to use rpy2 which the rpy developers say does a lot less
"guessing", but again JJ's driving the development on that and I don't
know where it's at).

And everyone says "what gives?". :)

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev


Re: [galaxy-dev] Changeset discrepancies

2011-01-24 Thread Ry4an Brase
On Mon, Jan 24, 2011 at 12:02:00PM -0500, lenta...@jimmy.harvard.edu wrote:
> Hi Galaxy Team,
> 
> I noticed that the main galaxy site is up to changeset 4919, but the
> Mecurial repository (http://www.bx.psu.edu/hg/galaxy) is only up to
> changeset 4640.  Why is there a discrepancy?  Did the repository move?

The http://www.bx.psu.edu/hg/galaxy is an alias for
https://bitbucket.org/galaxy/galaxy-dist which is the 'release'
repository.  The work-in-progress repository is
https://bitbucket.org/galaxy/galaxy-central which has 4920 changesets.
The galaxy site itself is usually somewhere between the two.

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev


[galaxy-dev] bowtie map to BED conversions

2011-01-06 Thread Ry4an Brase
I've got a user request for a converter from bowtie's map output to BED
format, and looking at the provided script it's mostly just an
application of cut(1) and sort(1).

Is this something Galaxy already does through some mechanism we're not
finding or is this 3 line conversion script something I should be
adding and submitting back?

Thanks,

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev


Re: [galaxy-dev] genetrack behind https

2011-01-03 Thread Ry4an Brase
On Mon, Jan 03, 2011 at 09:49:59PM -0500, Daniel Blankenberg wrote:
> Hi Ry4an,
> 
> You are correct that Galaxy's GeneTrack integration requires running a
> local instance of GeneTrack. In the case of GeneTrack, files are
> accessed through a shared file system. I've added a note to clarify
> this in the wiki. 

Thanks for the clarification.  I should've figured it out from the
shared file access, but the UCSC viewers went in so easily I got overly
optimistic. ;)

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev


[galaxy-dev] genetrack behind https

2011-01-03 Thread Ry4an Brase
I'm trying to get genetrack working for bed data and when clicking on
the 'Genetrack' link for a bed format dataset I get a 500 Internal
Server Error django exception from http://genetrack.g2.bx.psu.edu saying
'Unable to validate key!'.  Example on our staging server:


http://genetrack.g2.bx.psu.edu/galaxy?filename=2f70726f6a6563742f67616c6178792d646174612f66696c65732f3030322f646174617365745f323639322e646174&hashkey=6005bb6978f963d1df79a20a92a3c2f144dbe1ff&input=458&GALAXY_URL=http://dbw-galaxy.msi.umn.edu/tool_runner%3Ftool_id%3Dpredict2genetrack

The GALAXY_URL I'm sending it decodes to:

http://dbw-galaxy.msi.umn.edu/tool_runner?tool_id=predict2genetrack

which redirects (302) to:

https://dbw-galaxy.msi.umn.edu/tool_runner?tool_id=predict2genetrack

However, I don't see a request for either in the Apache log.

Can the genetrack.g2.bx.psu.edu server be used for other galaxy
installations as can the UCSC visualizer and I'm running afoul of my redirect
and/or https setup, or should I have figured out that I need my own
genetrack server from
https://bitbucket.org/galaxy/galaxy-central/wiki/ExternalDisplayApplications/Tutorial
?

Thanks,

-- 
Ry4an Brase 612-626-6575
Software Developer  Application Development
University of Minnesota Supercomputing Institutehttp://www.msi.umn.edu
___
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev