[galaxy-dev] Question on setting metadata on upload via API

2011-05-05 Thread Duddy, John
I'm looking at extending the metadata fields for one of the supported file 
types. The files can get VERY large, and since I'm creating those files, I'd 
like to save as metadata some of the information I have on the contents.

Specifically, I'd like to tag the files with information about the sample as 
well as the byte offsets in the file for every millionth record (to facilitate 
fast file splitting). I saw the sample script for uploading a file to a library 
via the API. It looks to me like the most direct approach would be to modify 
LibrariesController.create() to handle arbitrary key-value pairs to be set on 
the data file.

However, that might not support things like having metadata values be arrays.

So, does my understanding of the problem and the areas involved seem 
reasonable? Is there something already out there that does what I need? I saw 
the set_metadata.py script, but that looks like it is meant to operate directly 
on the files, which might not make a running instance of Galaxy too happy. Is 
there an API version of that (or something in one of the web controllers I can 
repurpose to my needs)?

Thanks!


John Duddy
Illumina, Inc.


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Question on setting metadata on upload via API

2011-05-06 Thread Duddy, John
So, to summarize:
1 - I can upload files via the API now, which returns the id
2 - If I don't upload via directory paths, the file upload will not be instant, 
so I'll have to use a poller to find out when the file upload is done
3 - I can write a new API to set metadata on a dataset and call that when the 
file is done. If I don't, the only metadata file file will have is the stuff 
created during set_meta().

Right?


John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Nate Coraor [mailto:n...@bx.psu.edu] 
Sent: Friday, May 06, 2011 8:12 AM
To: Duddy, John
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Question on setting metadata on upload via API

Duddy, John wrote:
> I'm looking at extending the metadata fields for one of the supported file 
> types. The files can get VERY large, and since I'm creating those files, I'd 
> like to save as metadata some of the information I have on the contents.
> 
> Specifically, I'd like to tag the files with information about the sample as 
> well as the byte offsets in the file for every millionth record (to 
> facilitate fast file splitting). I saw the sample script for uploading a file 
> to a library via the API. It looks to me like the most direct approach would 
> be to modify LibrariesController.create() to handle arbitrary key-value pairs 
> to be set on the data file.
> 
> However, that might not support things like having metadata values be arrays.
> 
> So, does my understanding of the problem and the areas involved seem 
> reasonable? Is there something already out there that does what I need? I saw 
> the set_metadata.py script, but that looks like it is meant to operate 
> directly on the files, which might not make a running instance of Galaxy too 
> happy. Is there an API version of that (or something in one of the web 
> controllers I can repurpose to my needs)?

Hi John,

Galaxy will still set its own required metadata elements (determined by
the file type) on upload, but if you create additional optional metadata
elements and don't set them in your datatype's set_meta(), you could
then set them after the job runs with a call to the API.  Editing
metadata via the API is not yet implemented, but shouldn't be too
difficult to add.

This does mean you'd have to get the newly created dataset ID and make
second call to the API.  If that's not feasible it could probably be
incorporated at creation time, but would be trickier since the upload
code doesn't account for the possibility that metadata exists prior to
running the upload tool itself.

--nate

> 
> Thanks!
> 
> 
> John Duddy
> Illumina, Inc.
> 
> 

> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Unable to set metadata in API call

2011-05-06 Thread Duddy, John
I need to be able to set some metadata in some custom data types. For now, I'm 
just trying to set the value of the 'misc_info' field. The client script is 
this:
put( sys.argv[1], sys.argv[2], { 'update_type' : 'metadata', 'misc_info' : 
'meta data msg' } )
and my API method is as follows. It executes fine, but the values do not show 
on the next show() call.
Disclaimer: Python is still quite new to me, and it's very likely I do not 
understand the metadata model and the way library dataset associates work.

Any clues will be greatly appreaciated!
@web.expose_api
def update( self, trans, id,  library_id, payload, **kwd ):
"""
PUT 
/api/libraries/{encoded_library_id}/contents/{encoded_content_type_and_id}
Sets attributes (metadata) on a library item.
"""
update_type = None
if 'update_type' not in payload:
trans.response.status = 400
return "Missing required 'update_type' parameter.  Please consult 
the API documentation for help."
else:
update_type = payload.pop( 'update_type' )
if update_type not in ( 'metadata' ):
trans.response.status = 400
return "Invalid value for 'update_type' parameter ( %s ) specified. 
 Please consult the API documentation for help." % update_type
content_id = id
decoded_type_and_id = trans.security.decode_string_id( content_id )
content_type, decoded_content_id = decoded_type_and_id.split( '.' )
if content_type not in ( 'file' ):
trans.response.status = 400
return "Updates allowed only on files, not directories"
try:
content = trans.sa_session.query( 
trans.app.model.LibraryDatasetDatasetAssociation ).get( decoded_content_id )
except:
content = None
if not content or ( not trans.user_is_admin() and not 
trans.app.security_agent.can_modify_library_item( 
trans.get_current_user_roles(), content, trans.user ) ):
trans.response.status = 400
return "Invalid %s id ( %s ) specified." % ( content_type, str( 
content_id ) )

metadata = content.get_metadata()
content.datatype.before_setting_metadata(content)

if not metadata:
   metadata = {}

for name in payload:
if name not in [ 'name', 'info', 'dbkey' ]:
setattr( metadata, name, payload[name])

content.set_metadata(metadata)
content.datatype.after_setting_metadata( content )
trans.sa_session.flush()

return "OK"


John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Unable to set metadata in API call

2011-05-09 Thread Duddy, John
Silly me - 'misc_info' shows up in the show() output, but I didn't bother to 
verify that it was metadata and not some other member.


Thanks!

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

-Original Message-
From: Nate Coraor [mailto:n...@bx.psu.edu] 
Sent: Monday, May 09, 2011 8:27 AM
To: Duddy, John
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Unable to set metadata in API call

Duddy, John wrote:
> I need to be able to set some metadata in some custom data types. For now, 
> I'm just trying to set the value of the 'misc_info' field. The client script 
> is this:
> put( sys.argv[1], sys.argv[2], { 'update_type' : 'metadata', 'misc_info' : 
> 'meta data msg' } )
> and my API method is as follows. It executes fine, but the values do not show 
> on the next show() call.

Hi John,

Your method below works fine for existing metadata items like
'data_lines', so I suspect the problem is not with the update method,
but with the datatype definition.  Does your class have a 'misc_info'
MetadataElement?

--nate

> Disclaimer: Python is still quite new to me, and it's very likely I do not 
> understand the metadata model and the way library dataset associates work.
> 
> Any clues will be greatly appreaciated!
> @web.expose_api
> def update( self, trans, id,  library_id, payload, **kwd ):
> """
> PUT 
> /api/libraries/{encoded_library_id}/contents/{encoded_content_type_and_id}
> Sets attributes (metadata) on a library item.
> """
> update_type = None
> if 'update_type' not in payload:
> trans.response.status = 400
> return "Missing required 'update_type' parameter.  Please consult 
> the API documentation for help."
> else:
> update_type = payload.pop( 'update_type' )
> if update_type not in ( 'metadata' ):
> trans.response.status = 400
> return "Invalid value for 'update_type' parameter ( %s ) 
> specified.  Please consult the API documentation for help." % update_type
> content_id = id
> decoded_type_and_id = trans.security.decode_string_id( content_id )
> content_type, decoded_content_id = decoded_type_and_id.split( '.' )
> if content_type not in ( 'file' ):
> trans.response.status = 400
> return "Updates allowed only on files, not directories"
> try:
> content = trans.sa_session.query( 
> trans.app.model.LibraryDatasetDatasetAssociation ).get( decoded_content_id )
> except:
> content = None
> if not content or ( not trans.user_is_admin() and not 
> trans.app.security_agent.can_modify_library_item( 
> trans.get_current_user_roles(), content, trans.user ) ):
> trans.response.status = 400
> return "Invalid %s id ( %s ) specified." % ( content_type, str( 
> content_id ) )
> 
> metadata = content.get_metadata()
> content.datatype.before_setting_metadata(content)
> 
> if not metadata:
>metadata = {}
> 
> for name in payload:
> if name not in [ 'name', 'info', 'dbkey' ]:
> setattr( metadata, name, payload[name])
> 
> content.set_metadata(metadata)
> content.datatype.after_setting_metadata( content )
> trans.sa_session.flush()
> 
> return "OK"
> 
> 
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>
> 

> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] A tool with no inputs

2011-05-16 Thread Duddy, John
Doesn't this violate one of the basic tenets of Galaxy - reproducibility? 
Without the ability to provide full traceability to the inputs, one can make no 
guarantees about the outputs.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Paul-Michael Agapow
Sent: Monday, May 16, 2011 7:45 AM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] A tool with no inputs

One of my colleagues is having trouble developing a peculiar tool: it has no 
inputs. This makes sense in our local context - it fetches some constantly 
updating remote data for the current user - but implementing it has escaped our 
skill. Galaxy complains about a tool with no params (i.e. an empty ). 
It complains about a tool with no  tag. Hidden input fields don't seem 
to work (i.e. looks like I'm getting the cached value of the form).

It is admittedly a fairly niche use-case and we could put in a dummy field and 
simply not use it, but that seems inelegant. Any suggestions?


Paul Agapow (paul-michael.aga...@hpa.org.uk)
Bioinformatics, Centre for Infections, Health Protection Agency



** The 
information contained in the EMail and any attachments is confidential and 
intended solely and for the attention and use of the named addressee(s). It may 
not be disclosed to any other person without the express authority of the HPA, 
or the intended recipient, or both. If you are not the intended recipient, you 
must not disclose, copy, distribute or retain this message or any part of it. 
This footnote also confirms that this EMail has been swept for computer 
viruses, but please re-sweep any attachments before opening or saving. 
HTTP://www.HPA.org.uk 
**
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Getting binary programs into Galaxy distribution?

2011-05-24 Thread Duddy, John
There is a C program for merging Gzip files (gzjoin) that I'd love to rely on 
for a core Galaxy capability. Is there a standard way to get things like this 
included in Galaxy? Recoding it in Python would be a bit of a pain, and might 
be a lot slower due to the IO layer not allowing the reuse of buffers.

Thanks -

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Getting binary programs into Galaxy distribution?

2011-05-25 Thread Duddy, John
I had considered cat, but from what I have read, not all readers understand it, 
or understand it as a single compressed stream. Since the resulting file would 
differ from a block gzipped file (embedded gzip headers with filenames, 
embedded per-file trailers, and the final CRC/size not matching the entire 
contents) I worry cat would be a brittle solution.

Also, single AFAIK gzjoin is not part of any installable package (more of an 
example program) I don't know if it's something that can be addressed by a 
dependency installation system, unless you host the installations.

There is stuff that happens on first start, such as copying *example files. 
Would it be far-fetched to compile the program at that stage?



From: James Taylor [ja...@jamestaylor.org]
Sent: Tuesday, May 24, 2011 11:51 PM
To: Duddy, John
Cc: galaxy-...@bx.psu.edu Dev
Subject: Re: [galaxy-dev] Getting binary programs into Galaxy distribution?

John, I'll take a look at the program. There isn't a great way to do
this until the dependency installation system is working. A thin
python wrapper (using Cython) would be the usual trick we would use.

However: have you considered just using cat? This should be completely
valid for gzip (at the cost of an extra 15 bytes per source file or so
for duplicate headers). It looks like gzjoin does require
decompression of all input data so this trade off may be worthwhile.


On May 24, 2011, at 3:13 PM, Duddy, John wrote:

> There is a C program for merging Gzip files (gzjoin) that I’d love
> to rely on for a core Galaxy capability. Is there a standard way to
> get things like this included in Galaxy? Recoding it in Python would
> be a bit of a pain, and might be a lot slower due to the IO layer
> not allowing the reuse of buffers.
>
> Thanks -
>
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Getting binary programs into Galaxy distribution?

2011-05-25 Thread Duddy, John
Sounds like we'll have to do a phased solution. I'll see if cat works well 
enough for tool interoperability for now and see what we can do to make it 
better (if needed) later.

Thanks, guys!

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Nate Coraor [mailto:n...@bx.psu.edu] 
Sent: Wednesday, May 25, 2011 9:24 AM
To: Duddy, John
Cc: James Taylor; galaxy-...@bx.psu.edu Dev
Subject: Re: [galaxy-dev] Getting binary programs into Galaxy distribution?

Duddy, John wrote:
> I had considered cat, but from what I have read, not all readers understand 
> it, or understand it as a single compressed stream. Since the resulting file 
> would differ from a block gzipped file (embedded gzip headers with filenames, 
> embedded per-file trailers, and the final CRC/size not matching the entire 
> contents) I worry cat would be a brittle solution.
> 
> Also, single AFAIK gzjoin is not part of any installable package (more of an 
> example program) I don't know if it's something that can be addressed by a 
> dependency installation system, unless you host the installations.
> 
> There is stuff that happens on first start, such as copying *example files. 
> Would it be far-fetched to compile the program at that stage?

We probably can't make the assumption that the user has the zlib dev
package with the headers, or even a C compiler, but this is a general
problem that we will need to solve as we implement tight dependency
control.

My initial thought on this is automatic fetching of precompiled
binaries, and if not available for your platform then we can try to
automatically compile.  Although really, I would be surprised if anyone
is running tools on anything other than Mac or Linux.

For the immediate solution, we could probably fetch the binary much like
we do for eggs.

--nate

> 
> 
> 
> From: James Taylor [ja...@jamestaylor.org]
> Sent: Tuesday, May 24, 2011 11:51 PM
> To: Duddy, John
> Cc: galaxy-...@bx.psu.edu Dev
> Subject: Re: [galaxy-dev] Getting binary programs into Galaxy distribution?
> 
> John, I'll take a look at the program. There isn't a great way to do
> this until the dependency installation system is working. A thin
> python wrapper (using Cython) would be the usual trick we would use.
> 
> However: have you considered just using cat? This should be completely
> valid for gzip (at the cost of an extra 15 bytes per source file or so
> for duplicate headers). It looks like gzjoin does require
> decompression of all input data so this trade off may be worthwhile.
> 
> 
> On May 24, 2011, at 3:13 PM, Duddy, John wrote:
> 
> > There is a C program for merging Gzip files (gzjoin) that I’d love
> > to rely on for a core Galaxy capability. Is there a standard way to
> > get things like this included in Galaxy? Recoding it in Python would
> > be a bit of a pain, and might be a lot slower due to the IO layer
> > not allowing the reuse of buffers.
> >
> > Thanks -
> >
> > John Duddy
> > Sr. Staff Software Engineer
> > Illumina, Inc.
> > 9885 Towne Centre Drive
> > San Diego, CA 92121
> > Tel: 858-736-3584
> > E-mail: jdu...@illumina.com
> >
> > ___
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >
> >  http://lists.bx.psu.edu/
> 
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/
> 

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Is dynamic associated information per dataset possible?

2011-05-25 Thread Duddy, John
We'd like to be able to associate fixed things (project, Sample, sequencer 
used) with user's FASTQ files, and we'd also like to allow users to associate 
dynamic, site-specific stuff with the sequencing run. Currently, users track 
their runs using a CSV sample sheet, and often they add columns to that sample 
sheet for their own information.

Is it possible to associate that information with the FASTQ file when it is 
placed in Galaxy? I know about metadata, but the supported fields look like 
they are fixed in the code. I was hoping for a solution where the users do not 
need to modify the Galaxy code to pull this off.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Is dynamic associated information per dataset possible?

2011-05-26 Thread Duddy, John
That's good news. Is this available via the API as well? I didn't see examples 
of this anywhere in the code, but I thought it might be available by passing 
additional values in the dictionary.


-Original Message-
From: Nate Coraor [mailto:n...@bx.psu.edu] 
Sent: Thursday, May 26, 2011 1:38 AM
To: Duddy, John
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Is dynamic associated information per dataset 
possible?

Duddy, John wrote:
> We'd like to be able to associate fixed things (project, Sample, sequencer 
> used) with user's FASTQ files, and we'd also like to allow users to associate 
> dynamic, site-specific stuff with the sequencing run. Currently, users track 
> their runs using a CSV sample sheet, and often they add columns to that 
> sample sheet for their own information.
> 
> Is it possible to associate that information with the FASTQ file when it is 
> placed in Galaxy? I know about metadata, but the supported fields look like 
> they are fixed in the code. I was hoping for a solution where the users do 
> not need to modify the Galaxy code to pull this off.

Hi John,

You could accomplish this with library templates:

https://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/LibraryTemplates

--nate

> 
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>
> 

> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Is dynamic associated information per dataset possible?

2011-05-26 Thread Duddy, John
The sample tracking system looks interesting. It looks like this is designed to 
have the sequencers modeled in Galaxy, have Galaxy pull the data from the 
sequencers, and might assume 1-1 pairing between sequencing run and samples. 

I'd like to be able to support pushing files from a central location and 
variably multiplexed runs, all via the API (or extensions to the API).

Is there any up-to-date documentation on this feature?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Greg Von Kuster [mailto:g...@bx.psu.edu] 
Sent: Thursday, May 26, 2011 2:18 AM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Is dynamic associated information per dataset 
possible?

In addition to Data Library templates, which are useful after the sequencer 
data has arrived in a Galaxy Data Library, Galaxy's sample tracking system 
includes sample run templates which are very similar to the Data Library 
templates, but are associated with a sample as it progresses through it's 
sequence run lifecycle in the facility.  Sample run details templates are 
defined by the facility administrator.  They can be created in the Admin view 
via the Manage form definitions menu link.



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Accessing Data Library Template fields in tools?

2011-05-26 Thread Duddy, John
I have my data in a data library and have a form template defined so I can 
enter the sample information.

So, I import a data file into a history and want to run a tool on it. Can I 
pass the values of those form templates to my tool? Sort of line 
${input.form_field_id} ?

Thanks!

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Sharing authentication between Galaxy and other WSGI apps on the same web server (with custom UI)?

2011-06-20 Thread Duddy, John
I'd like to have Galaxy and another application installed on the same Apache 
server and have the user authenticate only once. I think I understand how to do 
that by deferring authentication to Apache (instead of using Galaxy's built-in 
database). So far, so good, I think.

What I'm wondering is if it is possible (in external user mode) to control the 
user experience of authentication versus being stuck with the one where the 
browser pops up the authentication dialog. Is it possible to implement a shared 
authentication mechanism that uses web pages for the UI? Or would we have to 
give up Apache-based security and snoop the Galaxy session cookie instead?

Many thanks -

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Sharing authentication between Galaxy and other WSGI apps on the same web server (with custom UI)?

2011-06-20 Thread Duddy, John
Thanks! That's perfect.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

-Original Message-
From: Assaf Gordon [mailto:gor...@cshl.edu] 
Sent: Monday, June 20, 2011 12:09 PM
To: Duddy, John; galaxy-...@bx.psu.edu
Subject: Re: [galaxy-dev] Sharing authentication between Galaxy and other WSGI 
apps on the same web server (with custom UI)?

Hello John,

I'm not an apache expert, but I can try to help with some info.

your question involves two different issues, which are not dependent on one 
another.

First, can one setup apache authentication that will affect both Galaxy and 
other "things" on your server ?
The answer is yes.

Example:
We've setup our authentication on the root location of the server.
Galaxy uses the prefix "/galaxy", and other services use other prefixes, and 
since all of them are "below" the root location, the authentication applies to 
all.
The user needs to login only once.

===
## Root location of the server, protected with NTLM authentication

 AuthName CSHL
 AuthType NTLM
 NTLMAuth on
 NTLMAuthoritative on
 ### couple of other authentication parameters...


##
## Galaxy uses load-balancing and mod_rewrite and other things,
## but since it's below the root location, it will use the same authentication
# Galaxy server

BalancerMember http://localhost:8081
BalancerMember http://localhost:8082

ReWriteRule ^/galaxy$ /galaxy/ [R]
RewriteRule ^/galaxy(.*) balancer://galaxyprod$1 [P]

require valid-user


##
## Other services on the same server will use the same authentication,
## and can also limit user access with "require" statement.
Alias /plans/ "/home/gordon/projects/plans/"

require user gordon 

===


Second,
Can apache use authentication which is not "built-in" in the browser, so 
instead of OS native ugly dialog, the user will see a custom web page?
The answer is still yes, because authentication in Apache is modular.

If you specify "AuthType BASIC" or "AuthType Digest" or "AuthType NTLM" (which 
are the only universally supported built-in authentication methods I'm aware 
of), then the client-side browser will display an OS native user/password 
dialog.

If you install a custom authentication module, then you can use "AuthType 
CUSTOMXXX" (or sometimes a different command) and apache will use the module 
for custom authentication (which can involve custom webpages or anything else).
As long as the custom module notifies apache that the user is authenticated, 
Apache doesn't care how it's done.

There's one apache module called "mod_auth_form" ( 
http://httpd.apache.org/docs/trunk/mod/mod_auth_form.html ) which does exactly 
that, but I'm not sure if it's considered stable.


There are other 3rd party solutions available, unfortunately those solution are 
usually quite complicated and laborious to install (I've read about them but 
never tried them myself):
http://blog.ianbicking.org/more-on-single-signon.html
https://neon1.net/mod_auth_pubtkt/
http://cosign.sourceforge.net/
http://mod-auth-script.sourceforge.net/

All of them claim to provide apache integration.

And just as in the first question, once you change the "AuthType" in the root 
location to a custom authentication module, all the other sub-URLs will use 
that authentication.

If you do get one of those to work, I'm interested in hearing about it, because 
I would like eventually to get rid of NTLM authentication.

Regards, 
  -gordon


Duddy, John wrote, On 06/20/2011 02:04 PM:
> I'd like to have Galaxy and another application installed on the same
> Apache server and have the user authenticate only once. I think I
> understand how to do that by deferring authentication to Apache
> (instead of using Galaxy's built-in database). So far, so good, I
> think.
> 
> 
> 
> What I'm wondering is if it is possible (in external user mode) to
> control the user experience of authentication versus being stuck with
> the one where the browser pops up the authentication dialog. Is it
> possible to implement a shared authentication mechanism that uses web
> pages for the UI? Or would we have to give up Apache-based security
> and snoop the Galaxy session cookie instead?
> 
> 
> 
> Many thanks -
> 
> 
> 
> *John Duddy Sr. Staff Software Engineer Illumina, Inc. *9885 Towne
> Centre Drive San Diego, CA 92121 Tel: 858-736-3584 E-mail:
> jdu...@illumina.com <mailto:jdu...@illumina.com>
> 
> 
> 



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Getting a list of workflows a user can run via the API

2011-07-25 Thread Duddy, John
I am doing an integration with Galaxy, and part of what I need to do is trigger 
workflows. To do that, I need to list them.

I can do this if the user owns the workflows, but the API does not return 
workflows that have been shared with the user.

Is there a way via the API to discover the sharing relationships? Or will I 
need to add this to the API?

Thanks!

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Cluster setup - shared temporary directory

2011-07-26 Thread Duddy, John
I can give you a very good example - if you are doing alignment and for some 
reason need to convert the input file before operating on them, such that you 
need a complete copy, /tmp may not have enough room. I have had this happen to 
me running lots of instances of an aligner, temporarily using 100G+ of temp 
space.

I don't see the need to have a "shared" temp space, but I do see the need to be 
able to tell the tools where you want them to put temp files. 

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Peter Cock
Sent: Tuesday, July 26, 2011 8:10 AM
To: Galaxy Dev
Subject: [galaxy-dev] Cluster setup - shared temporary directory

Hi all,

I'm reading http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Cluster

Could someone expand a little on this section please:

> Create a shared temporary directory
>
> Some tools make use of temporary files created on the server,
> but accessed on the nodes. For this, you'll need to make a
> directory (galaxy_dist/database/tmp by default) ...

I presume this is talking about the universe_wsgi.ini setting
new_file_path = database/tmp (if so, could that be explicit)?

I would like to know more about this from the tool author point
of view. Could you at least give one example of a tool that uses
this temporary folder? As a tool author I am unclear what the
purpose is (and it would be a shock if I accidentally use this
mapped folder instead of the local temp drive of a node).

Thanks,

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Problems with Galaxy on a mapped drive

2011-07-29 Thread Duddy, John
We had similar problems on NFS mounts to Isilon. We traced it to the default 
timeout for attribute caching on NFS mounts, which does not force a re-read of 
directory contents (hence file existence or size) for up to 30 seconds.

We worked around it by adding no-ac to the mount, but this can drastically 
increase the network traffic to the isilon, so there are tradeoffs to be made.

Even when you solve this, nfsv2 does not have open-close write consistency, so 
it is possible for a job to complete on a node and Galaxy to try to read the 
output files while the compute node is still flushing its write cache to the 
file. 

All of these scenarios are unlikely on a busy cluster, on which job<->Galaxy 
interactions will likely occur far enough apart in time for the caches to clear 
on their own.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Peter Cock
Sent: Friday, July 29, 2011 6:36 AM
To: Galaxy Dev
Subject: [galaxy-dev] Problems with Galaxy on a mapped drive

Hi all,

In my recent email I mentioned problems with our setup and mapped drives. I
am running a test Galaxy on a server under a CIFS mapped drive. If I map the
drive with noperms then things seem to work with submitting jobs to the cluster
etc, but that doesn't seem secure at all. Mounting with strict permissions seems
to cause various network latency related problems in Galaxy though.

Specifically during loading the converters and history export tool,
Galaxy creates
a temporary XML file which it then tries to parse. I was able to resolve this by
switching from tempfile.TemporaryFile to tempfile.mkstemp and adding a 1s
sleep, but it isn't very elegant. Couldn't you use a StringIO handle instead?

Later during start up there were two errors with a similar issue -
Galaxy creates
a temp folder then immediately tries to write a tar ball or zip file
to it. Again,
adding a 1 second sleep after creating the directory before using it seems to
work. See lib/galaxy/web/controllers/dataset.py

After that Galaxy started, but still gives problems - like the issue
reported here
which Galaxy handled badly (see patch):
http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-July/006213.html

Here again, inserting a one second sleep between writing the cluster script
file and setting its permissions made it work.

If those are the only issues, that can be dealt with. But are there likely to be
lots more similar problems of this nature later on? That is my worry.

How are most people setting up mapped drives for Galaxy with a cluster?

Thanks,

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] using Galaxy for map/reduce

2011-08-02 Thread Duddy, John
I did something similar, but implemented as an evolution of the original 
"basic" parallelism (see BWA), that:
- Moved the splitting of input files into the datatype classes
- Allowed any number of inputs to be split, as long as they were the same 
datatype (so they were mutually consistent - think paired end fastq files)
- Allowed other inputs to be shared among jobs
- Merged any number of outputs, which merge code implemented in the datatype 
classes

This worked functionally, but the IO required to split large files has proved 
too much for something like a whole genome (~500GB)

I was thinking of something philosophically similar to your dataset container 
idea, but more in the idea that a dataset is no longer a "file", so the jobs 
running on subsets of the dataset would just ask for the parts they need. 
Galaxy would take care of preserving the abstraction that the subset of the 
dataset is a single input file, perhaps by extracting the subset to a temporary 
file on local storage. Similarly, the merged outputs would just be held in the 
target dataset, not copied, thus making the IO cost for the "merge" 0 for the 
simple case where it is mere concatenation.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Andrew Straw
Sent: Tuesday, August 02, 2011 7:13 AM
To: galaxy-...@bx.psu.edu
Subject: [galaxy-dev] using Galaxy for map/reduce

Hi all,

I've been investigating use of Galaxy for our lab and it has many
attractive aspects -- a big thank you to all involved.

We still have a couple of related sticking points, however, that I would
like to get the Galaxy developers' feedback on. Basically, I want to use
Galaxy to run Map/Reduce type analysis on many initial data files. What
I mean is that I want to take many initial datasets (e.g. 250 or more),
perhaps already stored in a library, and then apply a workflow to each
and every one of them (the Map step). Then, on the many result datasets
(one from each of the initial datasets), I want to run a Reduce step
which creates a single dataset. I have achieved this in an imperfect and
not-quite-working way with a few tricks, but I hope that with a little
work, Galaxy could be much better for this type of use case.

I have a couple of specific problems and a proposal for a general solution:

1) My first specific problem is that loading many datasets (e.g. 250)
into history causes the javascript running locally withing a browser to
be extremely slow.

2) My second specific problem is that applying a workflow with N steps
to many datasets creates even more datasets (Nx250 additional datasets).
In addition to the slow Javascript problem, there seems to be other
issues I haven't diagnosed further, but the console in which I'm running
run.sh indicates many errors of the type "Exception AssertionError:
AssertionError('State  is not present in this identity map',) in
> ignored". Furthermore the webserver gets slow and my
nginx frontend proxy gives 504 gateway time-outs.

3) There's no good way to do reduce within Galaxy. Currently I work
around this by having a tool type which takes as an input a dataset and
then uploads this to a self-written webserver, which then collects such
uploads, performs the reduce, and offers a download link for the user to
collect the reduced dataset. The user must manually then upload this
dataset back into Galaxy for further processing.

My proposal for a general solution, and what I'd be interested in
feedback on, is an idea of a "dataset container" (this is just a working
name). It would look and act much like a dataset in the history, but
would in fact be a logical construct that merely bundles together a
homogeneous bunch of datasets. When a tool (or a workflow) is applied to
a dataset container, Galaxy would automatically create a new container
in which each dataset in this new container is the result of running the
tool. (Workflows with N steps would thus generate N new containers.) The
thing I like about this idea is that it preserves the ability to use
tools and workflows on both individual datasets and, with some
additional logic, on these new containers. In particular, I don't think
the tools and workflows themselves would have to be modified. This would
seemingly mitigate the slow Javascript issue by only showing a few items
in the history window (even though Galaxy may have launched many jobs in
the background). Furthermore, a new Reduce tool type could then act to
take a dataset container as input and output a single dataset.

A library doesn't seem a good candidate for the dataset container idea I
have above. I realize that a library also bundles together datasets, but
it has other attributes that don't play well with the above idea (the
idea of hierarchically arranged folders and heterogeneous datasets

[galaxy-dev] Question on timing with API, running workflows, and setting metadata

2011-08-02 Thread Duddy, John
I'd like to have an external program that registers a file by absolute path 
(link, not upload) in a data library, then immediately starts a workflow on it. 
My question is to whether or not that will work in the general case where:

-  The system is configured to set metadata externally

-  The compute grid is busy (so the set_meta call is queued for some 
time)

In this case, will the workflow be queued and set idle, waiting for the 
set_meta() to complete? Or will the API call to register the data library block 
until the set_meta() is done?

Thanks!

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Customizing/reusing the workflows/run.mako template

2011-08-09 Thread Duddy, John
I'd like to integrate with Galaxy and launch workflows, and I am hoping to 
reuse  Galaxy's support for prompting the user for items that are tagged as 
being set at runtime. It looks to me that I could load the workflows/run page 
in a frame on my app (on the same server) and reuse it.

The problem I'm having is that I don't understand how the parameters for the 
input files are encoded. For example, here is one for adding a column to a 
dataset:








1: Add column on data 6
















What are 22 and 53? They don't seem to correspond to dataset IDs.

Is there a way that I can encode data library IDs such that I can submit this 
form and have Galaxy run the workflow with the right files in a new history?

Thanks!

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Customizing/reusing the workflows/run.mako template

2011-08-10 Thread Duddy, John
Interesting - this was a workflow with exactly 1 step - the "add a column" 
tool. I looked in my database/files/000 directory, and there was no dataset id 
53 - would that be an id in the history_dataset_association table?

In any case, I would like to be able to start a workflow for the user without 
interfering with any notion of the "current" workflow they may have open in 
their browser. Is that possible now, or do I need to:

-  Create a history and set it be the "current" history (how is that 
done?)

-  Import my datasets into it

-  Fetch the form (which will autopopulate the file inputs from the 
"current" history

-  Let them launch it by submitting the form?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>

From: James Taylor [mailto:ja...@jamestaylor.org]
Sent: Tuesday, August 09, 2011 5:41 PM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Customizing/reusing the workflows/run.mako template

John, the prefixes like "22|" are added to the inputs associated with each 
step, so that they can be separated back out. In this case, the chunk of HTML 
you have pasted likely corresponds to the 22nd step of the workflow.

53 on the other hand should be a dataset id from the current history (unencoded 
in this case).

On Aug 9, 2011, at 8:24 PM, Duddy, John wrote:


I'd like to integrate with Galaxy and launch workflows, and I am hoping to 
reuse  Galaxy's support for prompting the user for items that are tagged as 
being set at runtime. It looks to me that I could load the workflows/run page 
in a frame on my app (on the same server) and reuse it.

The problem I'm having is that I don't understand how the parameters for the 
input files are encoded. For example, here is one for adding a column to a 
dataset:








1: Add column on data 6
















What are 22 and 53? They don't seem to correspond to dataset IDs.

Is there a way that I can encode data library IDs such that I can submit this 
form and have Galaxy run the workflow with the right files in a new history?

Thanks!

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Modifying how runtime inputs are resolved when running a workflow

2011-08-18 Thread Duddy, John
This is probably something only the Galaxy devs can answer, but I thought I'd 
give it a shot in the wider community. Some of you are doing some very 
complicated stuff.

If you have a workflow with several input blocks, you might have multiple fastq 
files you need to provide. A good example of this is paired-end analysis - it 
doesn't really matter in which order you provide the fastq files for R1 and R2, 
but providing just one or the other twice requires user intervention.

It would be really nice if Galaxy assigned a unique value from the history to 
runtime parameters (tool params and input modules) before reusing datasets. I'm 
trying to get the initial form for running a workflow as close as possible to 
"right" to reduce the number of steps a user has to go through. So, in this 
example, R1 would be assigned to one input and R2 to another (or the reverse), 
as the initial defaults.

Is that what the "context" parameter of ToolParameter.get_initial_value() is 
for? I see that is used for repeat-group and conditionals, but it's used to 
store all the eligible values, not just the ones chosen from the history so 
far. For other tools, it accumulates the input values chosen for the tool.

Would it be consistent with the design to modify templates/workflow/run.mako to 
accumulate a dictionary of datasets used, instead of clearing it for each 
module? Since InputDataModule uses DataToolParameter, I think I can make the 
change in DataToolParameter and get input blocks and tools in one fell swoop, 
and history items would not be reused across the entire workflow, providing the 
number of runtime parameters did not exceed the number of compatible history 
items.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] "multiple" inputs in tools used in workflows?

2011-08-23 Thread Duddy, John
While snooping around the Galaxy code, I noticed that some tool features are 
not supported in workflows, only in histories. Is there a list somewhere that 
lists the restrictions?

Specifically, are "multiple" inputs supported?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Storing a dict as metadata

2011-08-25 Thread Duddy, John
I'd like to have a datatype with a dict as metadata. This dict() would store 
file offsets to enable seeking around to process different sections of the file.

How do I add a dictionary data metadata element?



John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Storing a dict as metadata

2011-08-26 Thread Duddy, John
A converted dataset would be fine too.

I'm working on an enhancement that would allow the metadata to be provided when 
the file is uploaded/registered via the API. So to do what you say, I'd need to 
have a way of providing that converted dataset.

The files I'm talking about are concatenated GZIP files, and the GZIP format 
specification doesn't contain any information about the size of the compressed 
data, only the uncompressed size (and then, modulo 2^32). AFAIK, anything in 
Galaxy that tried to create the auxiliary index would need to read and 
decompress all the data in the file to do that - easily an hours' worth of work 
for some of our full genome runs. We have all that information already when we 
make the file, so I'd prefer to just give it to Galaxy at the start. I could 
place stuff in a special section in the first GZIP header, but then this 
capability would not be as general-purpose as it could be.

I also want to prevent unnecessary gzip decompression in python, because 
serious decompression in versions before 2.7 is so slow as to be unusable for 
large datasets.

Is there a way to upload that converted dataset when I upload/register the main 
file? I'd also need to know how to write such a file.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: James Taylor [mailto:ja...@jamestaylor.org] 
Sent: Friday, August 26, 2011 5:37 AM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Storing a dict as metadata

Hey John, are you sure you don't want to use a "converted dataset" rather than 
a metadata element for this. This is how we handle most types of secondary 
indexes for visualization. 

If you do it this way, the converter that creates the offset index is just 
another tool (but registered in datatypes_conf.xml) and the index it self is 
another dataset that can be accessed through the converted datasets 
relationship. 

On Aug 25, 2011, at 6:12 PM, Duddy, John wrote:

> I'd like to have a datatype with a dict as metadata. This dict() would store 
> file offsets to enable seeking around to process different sections of the 
> file.
>  
> How do I add a dictionary data metadata element?
>  
>  
>  
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
>  
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] using Galaxy for map/reduce

2011-08-26 Thread Duddy, John
Many of the tools out there work on files, and assume they are supposed to work 
on the whole file (or take arguments for subsets that vary from tool to tool).

I'm working on a way for Galaxy to handle all these tools transparently, even 
if, as in my case, the files are compressed but the tools cannot read 
compressed files.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Edward Kirton [mailto:eskir...@lbl.gov] 
Sent: Friday, August 26, 2011 12:34 PM
To: Duddy, John
Cc: galaxy-...@bx.psu.edu
Subject: Re: [galaxy-dev] using Galaxy for map/reduce

Not intending to hijack the thread, but in response to John's comment
-- I, too, made a general solution for embarassingly parallel problems
but instead of splitting the large files on disk, I just use seek to
move the file pointer so each task can grab it's part.

On Tue, Aug 2, 2011 at 10:54 AM, Duddy, John  wrote:
> I did something similar, but implemented as an evolution of the original 
> "basic" parallelism (see BWA), that:
> - Moved the splitting of input files into the datatype classes
> - Allowed any number of inputs to be split, as long as they were the same 
> datatype (so they were mutually consistent - think paired end fastq files)
> - Allowed other inputs to be shared among jobs
> - Merged any number of outputs, which merge code implemented in the datatype 
> classes
>
> This worked functionally, but the IO required to split large files has proved 
> too much for something like a whole genome (~500GB)
>
> I was thinking of something philosophically similar to your dataset container 
> idea, but more in the idea that a dataset is no longer a "file", so the jobs 
> running on subsets of the dataset would just ask for the parts they need. 
> Galaxy would take care of preserving the abstraction that the subset of the 
> dataset is a single input file, perhaps by extracting the subset to a 
> temporary file on local storage. Similarly, the merged outputs would just be 
> held in the target dataset, not copied, thus making the IO cost for the 
> "merge" 0 for the simple case where it is mere concatenation.
>
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
>
>
> -Original Message-
> From: galaxy-dev-boun...@lists.bx.psu.edu 
> [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Andrew Straw
> Sent: Tuesday, August 02, 2011 7:13 AM
> To: galaxy-...@bx.psu.edu
> Subject: [galaxy-dev] using Galaxy for map/reduce
>
> Hi all,
>
> I've been investigating use of Galaxy for our lab and it has many
> attractive aspects -- a big thank you to all involved.
>
> We still have a couple of related sticking points, however, that I would
> like to get the Galaxy developers' feedback on. Basically, I want to use
> Galaxy to run Map/Reduce type analysis on many initial data files. What
> I mean is that I want to take many initial datasets (e.g. 250 or more),
> perhaps already stored in a library, and then apply a workflow to each
> and every one of them (the Map step). Then, on the many result datasets
> (one from each of the initial datasets), I want to run a Reduce step
> which creates a single dataset. I have achieved this in an imperfect and
> not-quite-working way with a few tricks, but I hope that with a little
> work, Galaxy could be much better for this type of use case.
>
> I have a couple of specific problems and a proposal for a general solution:
>
> 1) My first specific problem is that loading many datasets (e.g. 250)
> into history causes the javascript running locally withing a browser to
> be extremely slow.
>
> 2) My second specific problem is that applying a workflow with N steps
> to many datasets creates even more datasets (Nx250 additional datasets).
> In addition to the slow Javascript problem, there seems to be other
> issues I haven't diagnosed further, but the console in which I'm running
> run.sh indicates many errors of the type "Exception AssertionError:
> AssertionError('State  object at 0x7f5c18c47990> is not present in this identity map',) in
>   0x7f5c18c47990>> ignored". Furthermore the webserver gets slow and my
> nginx frontend proxy gives 504 gateway time-outs.
>
> 3) There's no good way to do reduce within Galaxy. Currently I work
> around this by having a tool type which takes as an input a dataset and
> then uploads this to a self-written webserver, which then collects such
> uploads, performs the reduce, and offers a download link for the user to
> collect the reduc

Re: [galaxy-dev] Storing a dict as metadata

2011-08-26 Thread Duddy, John
I'm looking into these, and it seems that the spirit is to store a version of 
the data that is converted, like a FASTQ -> BAM or some such use case, where 
one file can be extracted from the other.

Am I missing a dimension to these files?

In any case, I'd have to add the ability to associate the files in the API, 
probably a new operation in the update method for library contents?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: James Taylor [mailto:ja...@jamestaylor.org] 
Sent: Friday, August 26, 2011 1:52 PM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Storing a dict as metadata

Not currently, but since a converted dataset is just a dataset, you could reuse 
all of the existing upload mechanism, and just add the converted dataset 
connection between the two after the fact. 
 

On Aug 26, 2011, at 11:54 AM, Duddy, John wrote:

> Is there a way to upload that converted dataset when I upload/register the 
> main file? I'd also need to know how to write such a file.


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] outputting different numbers of files based on variables?

2011-08-26 Thread Duddy, John
The BWA tool in NGS mapping does just what you want, just for inputs. The 
general idea is to use a  element and define your extra output in 
a  block.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Nikhil Joshi
Sent: Friday, August 26, 2011 3:57 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] outputting different numbers of files based on variables?

Hi all,

Is it possible to output a different number of files based on variables that 
the use has chosen?  I have a program that will output different numbers of 
files based upon the input data.  So, for example, if the user wants to use a 
single-end fastq file, the program outputs only one file, however, if the user 
chooses paired-end fastq files, it outputs three files.  Is there any way to 
get that to work in just one tool?  I could make separate tools for single vs. 
paired end, but I'd rather not.

- Nik.
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] [galaxy-user] Add library to dataset performance metric: developer vs production instances

2011-09-29 Thread Duddy, John
We routinely put large compressed fastq files into data libraries by that 
method (linking, no copy) and it is very fast, since the patch that stopped it 
decompressing the files.

You should probably make sure you specify the file format (fastqsanger) so 
Galaxy does not attempt to sniff the file to learn its datatype.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Jennifer Jackson
Sent: Thursday, September 29, 2011 12:13 PM
To: Roman Valls; Galaxy-Dev
Cc: galaxy-u...@lists.bx.psu.edu
Subject: Re: [galaxy-dev] [galaxy-user] Add library to dataset performance 
metric: developer vs production instances

Hi Roman,

This is a good question for the development community to provide 
feedback on, so I'll cross-post your question over to that list.

Best,

Jen
Galaxy team

On 9/19/11 2:30 PM, Roman Valls wrote:
> Hello,
>
> Today I was routinely adding a 27GB Illumina lane on my galaxy instance
> running on a cluster node. Just the regular cloned-from-hg type of
> instance with set_metadata_externally, no more tuning.
>
> It took more than 10 minutes to have the dataset imported into a data
> library via the filesystem path upload method... not copying it into
> galaxy, just "linking".
>
> galaxy.jobs INFO 2011-09-19 18:05:08,641 job 120 dispatched
> (...)
> galaxy.jobs DEBUG 2011-09-19 18:16:52,822 job 120 ended
> galaxy.datatypes.metadata DEBUG 2011-09-19 18:16:52,824 Cleaning up
> external metadata files
>
> Since I cannot add datasets to libraries in usegalaxy.org and compare, I
> was wondering if someone can state an approximated average time *for a
> production* galaxy installation to do that operation.
>
> I would like to have some empirical number to show on how a production
> deployment[1] could speed things up, as opposed to having individual
> galaxy instances per user in a cluster (as per IT policies):
>
> http://blogs.nopcode.org/brainstorm/2011/08/22/galaxy-on-uppmax-simplified/
>
> Thanks in advance !
> Roman
>
> [1] http://usegalaxy.org/production
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>http://lists.bx.psu.edu/

-- 
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/Support
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Running the doctet unit tests in Galaxy

2011-10-05 Thread Duddy, John
There are several files in Datatypes with doctest tests in them. Is there a 
convenient wrapper script to run them all?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Tool shed and datatypes

2011-10-05 Thread Duddy, John
Can we introduce new file types via tools in the tool shed? It seems Galaxy can 
load them if they are in the datatypes configuration file. Does tool 
installation automate the editing of that file?


John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Tool shed and datatypes

2011-10-05 Thread Duddy, John
One of the things we're facing is the sheer size of a whole human genome at 30x 
coverage. An effective way to deal with that is by compressing the FASTQ files. 
That works for BWA and our ELAND, which can directly read a compressed FASTQ, 
but other tools crash when reading compressed FASTQ filesfiles. One way to 
address that would be to introduce a new type, for example "CompressedFastQ", 
with a conversion to FASTQ defined. BWA could take both types as input. This 
would allow the best of both worlds - efficient storage and use by all existing 
tools.

Another example would be adding the CASAVA tools to Galaxy. Some of the 
statistics generation tools use custom file formats. To be able to make the use 
of those tools optional and configurable, they should be separate from the 
aligner, but that would require that Galaxy be made aware of the custom file 
formats - we'd have to add a datatype.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>

From: Greg Von Kuster [mailto:g...@bx.psu.edu]
Sent: Wednesday, October 05, 2011 6:25 PM
To: Duddy, John
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Tool shed and datatypes

Hello John,

The Galaxy tool shed currently is not enabled to automatically edit the 
datatypes_conf.xml file, although I could add this feature if the need exists.  
Can you elaborate on what you are looking to do regarding this?

Thanks!


On Oct 5, 2011, at 1:52 PM, Duddy, John wrote:


Can we introduce new file types via tools in the tool shed? It seems Galaxy can 
load them if they are in the datatypes configuration file. Does tool 
installation automate the editing of that file?


John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu<mailto:g...@bx.psu.edu>



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Tool shed and datatypes

2011-10-06 Thread Duddy, John
I'd be up for that something like that, although I have other tasking in the 
short term after I finish my parallelism work. I'd rather not have Galaxy do 
the compression/decompression, because that will not effectively utilize the 
distributed nature of many filesystems, such as Isilon, that our customers use. 
My parallelism work (second phase almost done) handles that by using a 
block-gzipped format and index files that allow the compute nodes to seek to 
the blocks they need and extract from there.

Another thing that should probably go along with this is an enhancement to 
metadata such that it can be fed in from the outside. We upload files by 
linking to file paths, and at that point, we know everything about the files 
(index information). So need to decompress a 500GB file and read the whole 
thing just to count the lines - all you have to do is ask ;-}

 
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Thursday, October 06, 2011 1:28 AM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes

On Thu, Oct 6, 2011 at 4:48 AM, Duddy, John  wrote:
> One of the things we're facing is the sheer size of a whole human genome at
> 30x coverage. An effective way to deal with that is by compressing the FASTQ
> files. That works for BWA and our ELAND, which can directly read a
> compressed FASTQ, but other tools crash when reading compressed FASTQ
> filesfiles. One way to address that would be to introduce a new type, for
> example "CompressedFastQ", with a conversion to FASTQ defined. BWA could
> take both types as input. This would allow the best of both worlds -
> efficient storage and use by all existing tools.

We'd discussed this and a more general approach where any file
could be gzipped, but the code to do that doesn't exist yet:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-September/006745.html

Issue filed:
https://bitbucket.org/galaxy/galaxy-central/issue/666/

That seems a better long term solution than the pragmatic short term
solution of fastqsanger-gzip (or whatever it gets called). Note that it
sounded like Edward Kirton might already be using this - you should
be consistent.

The other strong idea from that thread was moving from FASTQ to
unaligned BAM, which is gzipped compressed, and has explicit
support for paired end reads, read groups, etc.

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Tool shed and datatypes

2011-10-06 Thread Duddy, John
As I understand it, Isilion is built up from "bricks" that have storage and 
compute power. They replicate files amongst themselves, so that for every IO 
request there are multiple systems that could respond. They are interconnected 
by an ultra fast fibre backbone.

So, depending on your topology, it's possible to get a lot more throughput by 
working on different sections of the same file from different physical 
computers.

I haven't delved into BGZF, so I can't comment. My approach to block GZIP was 
just to concatenate multiple GZIP files and keep a record of the offsets and 
sequences contained in each. The advantage is compatibility, in that it 
decompresses just like it was one big chunk, yet you can compose subsets of the 
data without decompressing/recompressing and (as long as we actually have to 
write out the file subsets) can reap the reduced IO benefits of smaller writes.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Thursday, October 06, 2011 8:16 AM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes

On Thu, Oct 6, 2011 at 3:48 PM, Duddy, John  wrote:
> I'd be up for that something like that, although I have other tasking
> in the short term after I finish my parallelism work. I'd rather not have
> Galaxy do the compression/decompression, because that will not
> effectively utilize the distributed nature of many filesystems, such
> as Isilon, that our customers use.

Is that like a compressed filesystem, where there is probably less
benefit to storing the data gzipped?

> My parallelism work (second
> phase almost done) handles that by using a block-gzipped
> format and index files that allow the compute nodes to seek to
> the blocks they need and extract from there.

How similar is your block-gzipped approach to BGZF used in BAM?

> Another thing that should probably go along with this is an
> enhancement to metadata such that it can be fed in from the
> outside. We upload files by linking to file paths, and at that
> point, we know everything about the files (index information).
> So need to decompress a 500GB file and read the whole
> thing just to count the lines - all you have to do is ask ;-}

I can see how that might be useful.

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Tool shed and datatypes

2011-10-06 Thread Duddy, John
GZIP files are definitely our plan. I just finished testing the code that 
distributes the processing of a FASTQ (or pair for PE) to an arbitrary number 
of tasks, where each subtask extracts just the data it needs without reading 
any of the file it does not need. It extracts the blocks of GZIPped data into a 
standalone GZIP file just by copying whole blocks and appending them (if the 
window is not aligned perfectly, there is additional processing). Since the 
entire file does not need to be read, it distributes quite nicely.

I'll be preparing a pull request for it soon.


John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Thursday, October 06, 2011 9:19 AM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes

On Thu, Oct 6, 2011 at 5:02 PM, Duddy, John  wrote:
> As I understand it, Isilion is built up from "bricks" that have storage
> and compute power. They replicate files amongst themselves, so
> that for every IO request there are multiple systems that could
> respond. They are interconnected by an ultra fast fibre backbone.

So why not use gzipped files on top of that? Smaller chunks of
data to access so should be faster even with the decompression
once it gets to the CPU.

> So, depending on your topology, it's possible to get a lot more
> throughput by working on different sections of the same file from
> different physical computers.

Nice.

> I haven't delved into BGZF, so I can't comment. My approach to
> block GZIP was just to concatenate multiple GZIP files and keep
> a record of the offsets and sequences contained in each. The
> advantage is compatibility, in that it decompresses just like it
> was one big chunk, yet you can compose subsets of the data
> without decompressing/recompressing and (as long as we
> actually have to write out the file subsets) can reap the reduced
> IO benefits of smaller writes.

That sounds VERY similar to BGZF - have a read over the
SAM specification which covers this. Basically they stick
the block size into the gzip headers, and the BAM index files
(BAI) use a 64 bit offset which is split into the BGZF block
offset and the offset within that decompressed block. See:
http://samtools.sourceforge.net/SAM1.pdf

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Tool shed and datatypes

2011-10-10 Thread Duddy, John
datatypes for each 
cluster, so long as they are all guaranteed to be mutually implicitly 
convertible.  For a policy to manage *Shed rot, the most direct approach is to 
moderate and require approval for each submission, but I could imagine that 
responsibility quickly overwhelming the poor team responsible.  Unless I 
drastically overestimate the frequency with which submissions might be made 
(which is entirely possible), that poor team's operations could wind up looking 
not unlike the USPTO.

Anyway, my general point is that there are many non-trivial factors to consider 
in the question of creating a TypeShed.  But, if done right, the benefits could 
be huge, besides the likely awesomeness of the engineering involved.

Finally, let me echo Greg again, and say to please send additional thought and 
feedback.  What do you think about the points I raised?  What else is there to 
consider that hasn't occurred to me yet?  What would be the benefits and 
potential pitfalls?

Best,
Eric


From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] 
on behalf of Jim Johnson [j...@umn.edu]
Sent: Friday, October 07, 2011 2:06 PM
To: galaxy-dev@lists.bx.psu.edu
Cc: Greg Von Kuster
Subject: Re: [galaxy-dev] Tool shed and datatypes

Greg,

It would be great if there were a way to expand upon the core datatypes using 
the ToolShed.

Would it be possible to have a separate datatype repository within the ToolShed?

Datatype
   name=""
   description=""
   datatype_dependencies=[]
   definition=


The tool config could be expanded to have requirement for datatypes.
ssmap




Table datatype
Column|Type | Modifiers
-+-+---
  id  | integer | not null default 
nextval('datatype_id_seq'::regclass)
  name| character varying(255)  |
  version | character varying(40)   |
  description | text|
  definition  | text|
UNIQUE (name)

Table datatype_datatype_association
Column|Type | Modifiers
-+-+---
  id  | integer | not null default 
nextval('datatype_id_seq'::regclass)
  datatype_id | integer |
  requires_id | integer |
FOREIGN KEY (datatype_id) REFERENCES datatype(id)
FOREIGN KEY (requires_id) REFERENCES datatype(id)


Then for my mothur metagenomics tools I could define:

name="ssmap"   description="Secondary Structure Map"  version="1.0"  
datatype_dependencies=[tabular]
definition=
from galaxy.datatypes.tabular import Tabular
class SecondaryStructureMap(Tabular):
 file_ext = 'ssmap'
 def __init__(self, **kwd):
 """Initialize secondary structure map datatype"""
 Tabular.__init__( self, **kwd )
 self.column_names = ['Map']

 def sniff( self, filename ):
 """
 Determines whether the file is a secondary structure map format
 A single column with an integer value which indicates the row that 
this row maps to.
 check you make sure is structMap[10] = 380 then structMap[380] = 10.
 """
...




Then the align.check.xml tool_config could require the 'ssmap' datatype:


  Calculate the number of potentially misaligned 
bases
  
mothur
ssmap
   









> John,
>
> I've been following this message thread, and it seems it's gone in a 
> direction that differs from your initial question about the possibility for 
> Galaxy to handle automatic editing of the datatypes_conf.xml file when 
> certain Galaxy tool shed tools are automatically installed.  There are some 
> complexities to consider in attempting this.  One of the issues to consider 
> is that the work for adding support for a new datatype to Galaxy lies outside 
> of the intended function of the tool shed.  If new support is added to the 
> Galaxy code base, an entry for that new datatype should be manually added to 
> the table at the same time.  There may be benefits to enabling automatic 
> changes to datatype entries that already exist in the file (e.g., adding a 
> new converter for an existing datatype entry), but perhaps adding a 
> completely new datatype to the file may not be appropriate.  I'll continue to 
> think about this - send additional thought and feedback, as doing so is 
> always helpful
>
> Thanks!
>
> Greg
>
>
> On Oct 5, 2011, at 11:48 PM, Duddy, John wrote:
>
>> One of the things we're facing is the sheer size of a

Re: [galaxy-dev] What's causing this error?

2011-10-17 Thread Duddy, John
You mention that you moved it to an NFS volume - but it seems you also moved to 
a grid configuration using PBS?

If that's the case, what you are seeing might be an issue with NFS attribute 
caching or write caching, which causes files created from one machine to not 
appear until some time later (from the perspective of other machines). The PBS 
job notifications are not impacted by the filesystem latencies.

You can prove this by experiment if you alter the finish_job method in 
lib/galaxy/jobs/runners/pbs.py to do a sleep/wait loop, waiting up to 60 
seconds for the files to be readable. If that hack works, latency is your 
problem.

The solution is either to:

-  Configure your mounts not to use attribute caching (has performance 
impacts), or

-  Make the hack permanent.

This happened to us on SGE, which is why I know these details ;-}

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Luobin Yang
Sent: Monday, October 17, 2011 10:31 AM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] What's causing this error?

Hi,

Recently I moved my locally installed Galaxy from a local hard drive to an NFS 
mounted hard drive, when I run some tools, I go the following error from the 
log file:

Job output not returned by PBS: the output datasets were deleted while the job 
was running, the job was manually dequeued or there was a cluster error.

I am pretty sure the job was not manually dequeued. Any idea how this happened 
and how this can be fixed?

Thanks,
Luobin

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Looks like actual user breaks splitting

2011-11-02 Thread Duddy, John
The datatype you are using does not define a split method. Are you working with 
our in-progress gz type or fastqillumina?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>

From: Chorny, Ilya
Sent: Wednesday, November 02, 2011 11:50 AM
To: Duddy, John
Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu
Subject: Looks like actual user breaks splitting

Hey John,

Any thoughts?

Ilya

Traceback (most recent call last):
  File "/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/runners/tasks.py", 
line 73, in run_job
tasks = splitter.do_split(job_wrapper)
  File 
"/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/splitters/multi.py", line 
73, in do_split
input_type.split(input_datasets, get_new_working_directory_name, 
parallel_settings)
  File "/home/galaxy/ichorny/galaxy-central/lib/galaxy/datatypes/data.py", line 
473, in split
raise Exception("Text file splitting does not support multiple files")
Exception: Text file splitting does not support multiple files

Ilya Chorny Ph.D.
Bioinformatics Scientist I
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Work: 858.202.4582
Email: icho...@illumina.com<mailto:icho...@illumina.com>
Website: www.illumina.com<http://www.illumina.com>


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Looks like actual user breaks splitting

2011-11-02 Thread Duddy, John
I'll submit a pull request shortly...

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
Sent: Wednesday, November 02, 2011 12:24 PM
To: Duddy, John
Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
Subject: Re: Looks like actual user breaks splitting

John, Ilya,

I get further with sequence type inputs but it looks like
JobWrapper.get_output_datasets_and_fnames() is not returning the right
thing when outputs_to_working_directory = True

BTW, the base Data.split() method is broken after the updates to
Sequence.split() since it wasn't updated to expect
HistoryDatasetAssociations rather than filenames.  Could you take a look
at that when you get a chance?

--nate

Duddy, John wrote:
> The datatype you are using does not define a split method. Are you working 
> with our in-progress gz type or fastqillumina?
> 
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>
> 
> From: Chorny, Ilya
> Sent: Wednesday, November 02, 2011 11:50 AM
> To: Duddy, John
> Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu
> Subject: Looks like actual user breaks splitting
> 
> Hey John,
> 
> Any thoughts?
> 
> Ilya
> 
> Traceback (most recent call last):
>   File 
> "/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/runners/tasks.py", line 
> 73, in run_job
> tasks = splitter.do_split(job_wrapper)
>   File 
> "/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/splitters/multi.py", 
> line 73, in do_split
> input_type.split(input_datasets, get_new_working_directory_name, 
> parallel_settings)
>   File "/home/galaxy/ichorny/galaxy-central/lib/galaxy/datatypes/data.py", 
> line 473, in split
> raise Exception("Text file splitting does not support multiple files")
> Exception: Text file splitting does not support multiple files
> 
> Ilya Chorny Ph.D.
> Bioinformatics Scientist I
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Work: 858.202.4582
> Email: icho...@illumina.com<mailto:icho...@illumina.com>
> Website: www.illumina.com<http://www.illumina.com>
> 
> 

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Looks like actual user breaks splitting

2011-11-03 Thread Duddy, John
I'm not following you - it's been 6 months since I wrote that code ;-}

IT looks to me like a DatasetPath() object is always placed in that array, and 
with one exception near then, it looks like the change I made generates those 
objects the same way.

Do you have a stack trace for the merge problem I can look at?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
Sent: Thursday, November 03, 2011 2:22 PM
To: Duddy, John
Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
Subject: Re: Looks like actual user breaks splitting

Hi John,

It looks like the first issue is related to the change from
get_output_fnames() -> compute_outputs().  When
outputs_to_working_directory = False (default) this method
stores/returns a HistoryDatasetAssociation, but when True,
stores/returns a Dataset (the original method's behavior).  Thus,
accessing the object's .datatype attribute in the splitter's do_merge()
fails.

Thanks,
--nate

Duddy, John wrote:
> I'll submit a pull request shortly...
> 
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
> 
> 
> -Original Message-
> From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
> Sent: Wednesday, November 02, 2011 12:24 PM
> To: Duddy, John
> Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
> Subject: Re: Looks like actual user breaks splitting
> 
> John, Ilya,
> 
> I get further with sequence type inputs but it looks like
> JobWrapper.get_output_datasets_and_fnames() is not returning the right
> thing when outputs_to_working_directory = True
> 
> BTW, the base Data.split() method is broken after the updates to
> Sequence.split() since it wasn't updated to expect
> HistoryDatasetAssociations rather than filenames.  Could you take a look
> at that when you get a chance?
> 
> --nate
> 
> Duddy, John wrote:
> > The datatype you are using does not define a split method. Are you working 
> > with our in-progress gz type or fastqillumina?
> > 
> > John Duddy
> > Sr. Staff Software Engineer
> > Illumina, Inc.
> > 9885 Towne Centre Drive
> > San Diego, CA 92121
> > Tel: 858-736-3584
> > E-mail: jdu...@illumina.com<mailto:jdu...@illumina.com>
> > 
> > From: Chorny, Ilya
> > Sent: Wednesday, November 02, 2011 11:50 AM
> > To: Duddy, John
> > Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu
> > Subject: Looks like actual user breaks splitting
> > 
> > Hey John,
> > 
> > Any thoughts?
> > 
> > Ilya
> > 
> > Traceback (most recent call last):
> >   File 
> > "/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/runners/tasks.py", 
> > line 73, in run_job
> > tasks = splitter.do_split(job_wrapper)
> >   File 
> > "/home/galaxy/ichorny/galaxy-central/lib/galaxy/jobs/splitters/multi.py", 
> > line 73, in do_split
> > input_type.split(input_datasets, get_new_working_directory_name, 
> > parallel_settings)
> >   File "/home/galaxy/ichorny/galaxy-central/lib/galaxy/datatypes/data.py", 
> > line 473, in split
> > raise Exception("Text file splitting does not support multiple files")
> > Exception: Text file splitting does not support multiple files
> > 
> > Ilya Chorny Ph.D.
> > Bioinformatics Scientist I
> > Illumina, Inc.
> > 9885 Towne Centre Drive
> > San Diego, CA 92121
> > Work: 858.202.4582
> > Email: icho...@illumina.com<mailto:icho...@illumina.com>
> > Website: www.illumina.com<http://www.illumina.com>
> > 
> > 
> 

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Looks like actual user breaks splitting

2011-11-03 Thread Duddy, John
I just came to that conclusion myself. If you need me to do anything, let me 
know - but it sounds like you have a handle on it.

Muchas gracias.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
Sent: Thursday, November 03, 2011 3:53 PM
To: Duddy, John
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Looks like actual user breaks splitting

Nate Coraor (n...@bx.psu.edu) wrote:
> Duddy, John wrote:
> > I'm not following you - it's been 6 months since I wrote that code ;-}
> 
> I know the feeling!
> 
> > IT looks to me like a DatasetPath() object is always placed in that array, 
> > and with one exception near then, it looks like the change I made generates 
> > those objects the same way.
> 
> It's creating a dict in self.output_dataset_paths, and that dict looks
> like this when outputs_to_working_directory = False:
> 
> { output_param_name : [ HDA, DatasetPath ], ... }
> 
> And this when True:
> 
> { output_param_name : [ Dataset, DatasetPath ], ... }
> 
> > Do you have a stack trace for the merge problem I can look at?
> 
> If you put this in do_merge()'s except block:
> 
> log.exception( stdout )
> 
> You'll get:
> 
> Traceback (most recent call last):
>   File 
> "/space/nate/galaxy-central-ichorny/lib/galaxy/jobs/splitters/multi.py", line 
> 128, in do_merge
> output_type = outputs[output][0].datatype
> AttributeError: 'Dataset' object has no attribute 'datatype'
> 
> I could just change both methods to put an HDA in the list inside the
> dict there, but I haven't looked much into what output_dataset_paths is
> used for, so I wasn't sure what that might break.

Sorta answered it myself, it looks like you created this precisely for
do_merge(), so changing it to contain the HDA fixes the problem (and
shouldn't break anything else).

--nate

> 
> Thanks,
> --nate
> 
> > John Duddy
> > Sr. Staff Software Engineer
> > Illumina, Inc.
> > 9885 Towne Centre Drive
> > San Diego, CA 92121
> > Tel: 858-736-3584
> > E-mail: jdu...@illumina.com
> > 
> > 
> > -Original Message-
> > From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
> > Sent: Thursday, November 03, 2011 2:22 PM
> > To: Duddy, John
> > Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
> > Subject: Re: Looks like actual user breaks splitting
> > 
> > Hi John,
> > 
> > It looks like the first issue is related to the change from
> > get_output_fnames() -> compute_outputs().  When
> > outputs_to_working_directory = False (default) this method
> > stores/returns a HistoryDatasetAssociation, but when True,
> > stores/returns a Dataset (the original method's behavior).  Thus,
> > accessing the object's .datatype attribute in the splitter's do_merge()
> > fails.
> > 
> > Thanks,
> > --nate
> > 
> > Duddy, John wrote:
> > > I'll submit a pull request shortly...
> > > 
> > > John Duddy
> > > Sr. Staff Software Engineer
> > > Illumina, Inc.
> > > 9885 Towne Centre Drive
> > > San Diego, CA 92121
> > > Tel: 858-736-3584
> > > E-mail: jdu...@illumina.com
> > > 
> > > 
> > > -Original Message-
> > > From: Nate Coraor (n...@bx.psu.edu) [mailto:n...@bx.psu.edu] 
> > > Sent: Wednesday, November 02, 2011 12:24 PM
> > > To: Duddy, John
> > > Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
> > > Subject: Re: Looks like actual user breaks splitting
> > > 
> > > John, Ilya,
> > > 
> > > I get further with sequence type inputs but it looks like
> > > JobWrapper.get_output_datasets_and_fnames() is not returning the right
> > > thing when outputs_to_working_directory = True
> > > 
> > > BTW, the base Data.split() method is broken after the updates to
> > > Sequence.split() since it wasn't updated to expect
> > > HistoryDatasetAssociations rather than filenames.  Could you take a look
> > > at that when you get a chance?
> > > 
> > > --nate
> > > 
> > > Duddy, John wrote:
> > > > The datatype you are using does not define a split method. Are you 
> > > > working with our in-progress gz type or fastqillumina?
> > > > 
> > > > John Duddy
> > > > Sr. Staff Software Engineer
> > &g

Re: [galaxy-dev] Tool shed and datatypes

2011-11-08 Thread Duddy, John
It's not public yet, and it involves a little conundrum - we want it so we can 
support large amounts of data efficiently on a variety of aligners, including 
our ELAND from CASAVA. However, ELAND does not support unaligned BAM inputs 
yet, and apparently it would be a lot of work to make it so (and another team's 
area of responsibility as well). So in the near term, BGZF would not meet our 
needs.

However, work is quite far along on a GZIP-based one that works with ELAND and 
BWA, since they both read GZIP FASTQ files, and works/will work with a 
converter to fastq_sanger for other tools.

I can put you in touch with the engineer doing the work if you are interested.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Tuesday, November 08, 2011 3:29 PM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes

On Thu, Oct 6, 2011 at 5:45 PM, Duddy, John  wrote:
> GZIP files are definitely our plan. I just finished testing the code
> that distributes the processing of a FASTQ (or pair for PE) to an
> arbitrary number of tasks, where each subtask extracts just the
> data it needs without reading any of the file it does not need. It
> extracts the blocks of GZIPped data into a standalone GZIP file
> just by copying whole blocks and appending them (if the window
> is not aligned perfectly, there is additional processing). Since
> the entire file does not need to be read, it distributes quite nicely.
>
> I'll be preparing a pull request for it soon.
>
>
> John Duddy

Hi John,

Is your pull request public yet? I'd like to know more about
your GZIP based plan (and how it differs from BGZF). It
would seem silly to reinvent something slightly different
if an existing and well tested mechanism like BGZF (used
in BAM files) would work.

BGZF is based on GZIP with blocks each up to 64kb,
where the block size is recorded in the GZIP block
header. This may be more fine grained than the block
sizes you are using, but should serve equally well for
distribution of data chunks between machines/cores.

I appreciate the SAM/BAM specification where BGZF is
defined is quite dry reading, and the broad potential of
this GZIP variant beyond BAM is not articulated clearly.
So I've written a blog post about how BGZF can be used
for efficient random access to sequential files (in the
sense of one self contained record after another, e.g.
many sequence file formats including FASTA & FASTQ):

http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

I've also added a reference to BGZF on the open
Galaxy feature request for general support of gzipped
data types:

https://bitbucket.org/galaxy/galaxy-central/issue/666/

Regards,

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Tool shed and datatypes

2011-11-08 Thread Duddy, John
BTW - the pull request for the GZIP-based splitting is actually integrated - I 
was referring to the GZIP-based datatype.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Tuesday, November 08, 2011 3:29 PM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes

On Thu, Oct 6, 2011 at 5:45 PM, Duddy, John  wrote:
> GZIP files are definitely our plan. I just finished testing the code
> that distributes the processing of a FASTQ (or pair for PE) to an
> arbitrary number of tasks, where each subtask extracts just the
> data it needs without reading any of the file it does not need. It
> extracts the blocks of GZIPped data into a standalone GZIP file
> just by copying whole blocks and appending them (if the window
> is not aligned perfectly, there is additional processing). Since
> the entire file does not need to be read, it distributes quite nicely.
>
> I'll be preparing a pull request for it soon.
>
>
> John Duddy

Hi John,

Is your pull request public yet? I'd like to know more about
your GZIP based plan (and how it differs from BGZF). It
would seem silly to reinvent something slightly different
if an existing and well tested mechanism like BGZF (used
in BAM files) would work.

BGZF is based on GZIP with blocks each up to 64kb,
where the block size is recorded in the GZIP block
header. This may be more fine grained than the block
sizes you are using, but should serve equally well for
distribution of data chunks between machines/cores.

I appreciate the SAM/BAM specification where BGZF is
defined is quite dry reading, and the broad potential of
this GZIP variant beyond BAM is not articulated clearly.
So I've written a blog post about how BGZF can be used
for efficient random access to sequential files (in the
sense of one self contained record after another, e.g.
many sequence file formats including FASTA & FASTQ):

http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

I've also added a reference to BGZF on the open
Galaxy feature request for general support of gzipped
data types:

https://bitbucket.org/galaxy/galaxy-central/issue/666/

Regards,

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Tool shed and datatypes

2011-11-08 Thread Duddy, John
Ahh - sorry. I finally found the format specification for BGZF in the SAM 
format specification, and it seems that it is 100% GZIP-compatible. There is 
still the issue of needing an external file index, since all BGZF seems to give 
you is the size of the compressed block, not anything format-specific, like the 
number of sequences in the block.

In any case, whether it's GZIP or BGZF, it seems the solutions are very 
similar, and porting my work should be pretty simple - I just used larger 
blocks and put all the data in the index file and none in the headers.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Tuesday, November 08, 2011 4:04 PM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes

On Tue, Nov 8, 2011 at 11:45 PM, Duddy, John  wrote:
> It's not public yet, and it involves a little conundrum - we want
> it so we can support large amounts of data efficiently on a variety
> of aligners, including our ELAND from CASAVA. However, ELAND
> does not support unaligned BAM inputs yet, and apparently it
> would be a lot of work to make it so (and another team's area
> of responsibility as well).

OK, so using (unaligned) BAM isn't about to happen.

> So in the near term, BGZF would not meet our needs.
>

I don't follow you there, BAM != BGZF.

We can use BGZF to compress FASTQ, FASTA, GenBank,
basically anything. You get compression approaching that
of plain GZIP (depending on the characteristics of the data)
plus efficient random access.

> However, work is quite far along on a GZIP-based one
> that works with ELAND and BWA, since they both read
> GZIP FASTQ files, and works/will work with a converter
> to fastq_sanger for other tools.
>
> I can put you in touch with the engineer doing the work if
> you are interested.

That might be a good idea, or ask them to post here?

Peter

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Galaxy in a frame?

2011-11-14 Thread Duddy, John
You can put it in an iframe, and if you are serving it from the same host as 
your main app, you can also inject javascript/CSS to control certain aspects, 
such as styling and event propagation related to navigation.

However, this is not a trivial exercise.


John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Oren Livne
Sent: Friday, November 11, 2011 5:32 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] Galaxy in a frame?

Dear All,

We are trying to integrate Galaxy with our web app, and we thought of 
having our web app as the main front end, and allow users to open an 
HTML frame with Galaxy in it within that app. However, Galaxy makes it 
impossible - it takes over the entire page when a user logs in or clicks 
on most links. Is it possible to embed Galaxy in another webapp or is it 
only designed as a standalone app?

Thanks so much,
Oren
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Bootstrapping the Galaxy installation process: populating the database

2011-11-15 Thread Duddy, John
We want to automate certain aspects of setup. We already create the target 
database and put it in universe_wsgi.ini, but we want to plug some values into 
the tables during installation.

Currently, the schema is created the first time Galaxy is run. I did not see a 
way to cause that to happen via the manage_db.sh script. Have I missed 
something? If not, any pointers on how I might go about adding that (i.e. what 
stuff to call to just initialize the schema)?

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
E-mail: jdu...@illumina.com

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/