On Jul 8, 2013, at 3:33 PM, Peter Cock wrote:

> On Mon, Jul 8, 2013 at 11:21 PM, Robert Baertsch <rbaer...@ucsc.edu> wrote:
>> I respectfully disagree,  If you want an extensible system, you should
>> always wrap primitive system level calls.
>> 
>> Any tools that opens a file that could be compressed would be affected.
>> That is a huge number of tools. Do you really want a cottage industry of
>> tools that have different methods of dealing with compression?
> 
> But defining a Python helper function within the Galaxy Python
> libraries doesn't achieve that.
> 
> Are you talking about patching the OS level POSIX open functions
> or something?

no.
> The tools available in Galaxy are written in a range
> of languages including C, Perl, R, etc. Yes, some are in Python,
> but of those most are independent of Galaxy and can be used
> separately from Galaxy.
the helper function would have to ported to R. We are talking about how galaxy 
compressed data. Once we decide that, we can determine how to best implement it.

Proposal: Do not treat compressed data as a separate data type. Treat it as an 
independent attribute that can be applied to any data. Otherwise you will have 
to create a gzipped , zip and bz2 type for every type that you want to compress.

people can use the python helpers or write their own in other languages,
 
We need a galaxy_open function to hide details of compression from tool 
developers.

We could also open http files or pipes without any changes to tools. (other 
than changing open() to galaxy_open()

> 
>> Encoding the gzip status in the datatype will create an explosion of
>> datatypes. Compression is not actually a datatype, it tells you nothing
>> about the content data that is stored in the file.
> 
> What we'd previously discussed was a dual system, holding
> the file type as now (e.g. FASTA, SAM, GFF3, etc) and any
> compression (e.g., None, normal GZIP, BGZF which is a
> GZIP variant, BZIP2, etc).

What about tabular. Should we create tab.gz, tab.bz2 and tab.zip also? 

This will quickly get out of hand and create a mess for tool developers that 
need to support all thees types.

The tool code and tool xml should be written to handle uncompressed data and 
galaxy should handle the details of decompression. This is not hard to do.
> 
> Galaxy tool wrappers currently define input files with a list
> of file types - they'd also have to give a list of supported
> compression types (defaulting to none). Likewise for any
> output files - if they are already compressed the XML for
> the tool wrapper would have to tell Galaxy this.
> 
>> It is up to the galaxy team to provide a standard way to interact
>> with compressed files.
> 
> That is my preference too - although this could be driven by
> the Galaxy community rather than the core team? I see
> defining new datatypes like 'gzippedfastq' as a stop gap
> special case (but a very practical route for now).
> 
>> My proposed solution, is a very small change that could
>> be phased in over time. Any tools that uses open would not support
>> compressed files, but they would not break on uncompressed files.
>> 
>> Do others have an opinion?
> 
> Either I don't understand your plan, or it would only help in
> a tiny minority of cases.
> 
> Regards,
> 
> Peter


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to