Re: [galaxy-dev] gzipped fastq reader
On Jul 8, 2013, at 3:33 PM, Peter Cock wrote: > On Mon, Jul 8, 2013 at 11:21 PM, Robert Baertsch wrote: >> I respectfully disagree, If you want an extensible system, you should >> always wrap primitive system level calls. >> >> Any tools that opens a file that could be compressed would be affected. >> That is a huge number of tools. Do you really want a cottage industry of >> tools that have different methods of dealing with compression? > > But defining a Python helper function within the Galaxy Python > libraries doesn't achieve that. > > Are you talking about patching the OS level POSIX open functions > or something? no. > The tools available in Galaxy are written in a range > of languages including C, Perl, R, etc. Yes, some are in Python, > but of those most are independent of Galaxy and can be used > separately from Galaxy. the helper function would have to ported to R. We are talking about how galaxy compressed data. Once we decide that, we can determine how to best implement it. Proposal: Do not treat compressed data as a separate data type. Treat it as an independent attribute that can be applied to any data. Otherwise you will have to create a gzipped , zip and bz2 type for every type that you want to compress. people can use the python helpers or write their own in other languages, We need a galaxy_open function to hide details of compression from tool developers. We could also open http files or pipes without any changes to tools. (other than changing open() to galaxy_open() > >> Encoding the gzip status in the datatype will create an explosion of >> datatypes. Compression is not actually a datatype, it tells you nothing >> about the content data that is stored in the file. > > What we'd previously discussed was a dual system, holding > the file type as now (e.g. FASTA, SAM, GFF3, etc) and any > compression (e.g., None, normal GZIP, BGZF which is a > GZIP variant, BZIP2, etc). What about tabular. Should we create tab.gz, tab.bz2 and tab.zip also? This will quickly get out of hand and create a mess for tool developers that need to support all thees types. The tool code and tool xml should be written to handle uncompressed data and galaxy should handle the details of decompression. This is not hard to do. > > Galaxy tool wrappers currently define input files with a list > of file types - they'd also have to give a list of supported > compression types (defaulting to none). Likewise for any > output files - if they are already compressed the XML for > the tool wrapper would have to tell Galaxy this. > >> It is up to the galaxy team to provide a standard way to interact >> with compressed files. > > That is my preference too - although this could be driven by > the Galaxy community rather than the core team? I see > defining new datatypes like 'gzippedfastq' as a stop gap > special case (but a very practical route for now). > >> My proposed solution, is a very small change that could >> be phased in over time. Any tools that uses open would not support >> compressed files, but they would not break on uncompressed files. >> >> Do others have an opinion? > > Either I don't understand your plan, or it would only help in > a tiny minority of cases. > > Regards, > > Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
On Jul 8, 2013, at 3:33 PM, Peter Cock wrote: > On Mon, Jul 8, 2013 at 11:21 PM, Robert Baertsch wrote: >> I respectfully disagree, If you want an extensible system, you should >> always wrap primitive system level calls. >> >> Any tools that opens a file that could be compressed would be affected. >> That is a huge number of tools. Do you really want a cottage industry of >> tools that have different methods of dealing with compression? > > But defining a Python helper function within the Galaxy Python > libraries doesn't achieve that. > > Are you talking about patching the OS level POSIX open functions > or something? no. > The tools available in Galaxy are written in a range > of languages including C, Perl, R, etc. Yes, some are in Python, > but of those most are independent of Galaxy and can be used > separately from Galaxy. the helper function would have to ported to R. We are talking about how galaxy compressed data. Once we decide that, we can determine how to best implement it. Proposal: Do not treat compressed data as a separate data type. Treat it as an independent attribute that can be applied to any data. Otherwise you will have to create a gzipped , zip and bz2 type for every type that you want to compress. people can use the python helpers or write their own in other languages, We need a galaxy_open function to hide details of compression from tool developers. We could also open http files or pipes without any changes to tools. (other than changing open() to galaxy_open() > >> Encoding the gzip status in the datatype will create an explosion of >> datatypes. Compression is not actually a datatype, it tells you nothing >> about the content data that is stored in the file. > > What we'd previously discussed was a dual system, holding > the file type as now (e.g. FASTA, SAM, GFF3, etc) and any > compression (e.g., None, normal GZIP, BGZF which is a > GZIP variant, BZIP2, etc). What about tabular. Should we create tab.gz, tab.bz2 and tab.zip also? This will quickly get out of hand and create a mess for tool developers that need to support all thees types. The tool code and tool xml should be written to handle uncompressed data and galaxy should handle the details of decompression. This is not hard to do. > > Galaxy tool wrappers currently define input files with a list > of file types - they'd also have to give a list of supported > compression types (defaulting to none). Likewise for any > output files - if they are already compressed the XML for > the tool wrapper would have to tell Galaxy this. > >> It is up to the galaxy team to provide a standard way to interact >> with compressed files. > > That is my preference too - although this could be driven by > the Galaxy community rather than the core team? I see > defining new datatypes like 'gzippedfastq' as a stop gap > special case (but a very practical route for now). > >> My proposed solution, is a very small change that could >> be phased in over time. Any tools that uses open would not support >> compressed files, but they would not break on uncompressed files. >> >> Do others have an opinion? > > Either I don't understand your plan, or it would only help in > a tiny minority of cases. > > Regards, > > Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
I will implement this if the galaxy team likes the approach. We did this in ucsc genome browser code years ago: a single open_helper call handles, gzip, http, ftp and pipes. No need to care about how the data is compressed or where it data resides. wouldn't it be great to be able to pipe data between workflow steps rather than writing to disk? I admit that this will require some work but the first step is to abstract the open. On Jul 9, 2013, at 10:38 AM, Peter Cock wrote: > On Tue, Jul 9, 2013 at 5:53 PM, Robert Baertsch wrote: >> On Jul 8, 2013, at 3:33 PM, Peter Cock wrote: >>> The tools available in Galaxy are written in a range >>> of languages including C, Perl, R, etc. Yes, some are in Python, >>> but of those most are independent of Galaxy and can be used >>> separately from Galaxy. >> >> the helper function would have to ported to R. We are talking >> about how galaxy compressed data. Once we decide that, we >> can determine how to best implement it. > > Individual tools called from Galaxy read and create the files - > and we can't usually control them at this level (modifying them all > to call a Galaxy managed file open mechanism is not an option). > >> Proposal: Do not treat compressed data as a separate data type. >> Treat it as an independent attribute that can be applied to any data. >> Otherwise you will have to create a gzipped , zip and bz2 type for >> every type that you want to compress. > > That's what I've been saying - the fact that some people are > already using a new gzipped FASTQ format within their Galaxy > instances is practical, but I view it as a short term solution only. > Encoding the gzip status in the datatype will create an explosion of datatypes. Compression is not actually a datatype, it tells you nothing about the content data that is stored in the file. >>> >>> What we'd previously discussed was a dual system, holding >>> the file type as now (e.g. FASTA, SAM, GFF3, etc) and any >>> compression (e.g., None, normal GZIP, BGZF which is a >>> GZIP variant, BZIP2, etc). >> >> What about tabular. Should we create tab.gz, tab.bz2 and tab.zip also? > > Note ZIP is a bit different, as it is often a multiple file bundle - > it behaves differently from GZIP, BGZF, XY, BZIP2 etc in that > regard. > > But otherwise, yes. As a specific example, the tabix tool used BGZF > compressed tabular data to combine compression and efficient > random access. This would be useful for many annotation files > (e.g. GTF, GFF3). > >> This will quickly get out of hand and create a mess for tool >> developers that need to support all thees types. > > Why? Individual tool developers don't need to know if Galaxy > is keeping the original data file on disk compressed - unless > the tool XML says otherwise, Galaxy would hide this detail > and call the tool with an uncompressed input file. > > (Unix named pipe which decompresses the file on the file would > be a potential alternative - but only if the tool XML was marked > up to say that an input could be streamed. The default must be > to assume potential random access to the input files) > >> The tool code and tool xml should be written to handle uncompressed >> data and galaxy should handle the details of decompression. This >> is not hard to do. > > It isn't trivial either ;) > > Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
great. Let's put the bx-python calls in a galaxy_open helper function. On Jul 8, 2013, at 3:20 PM, James Taylor wrote: > open_compressed in bx-python does this already (for bz2 as well). > > On Jul 8, 2013, at 5:58 PM, Peter Cock wrote: > >> On Mon, Jul 8, 2013 at 10:24 PM, Robert Baertsch >> wrote: >>> Peter and Dan, >>> I like the idea of replacing all open() with galaxy_open() in all tools. You >>> can tell the format by looking at the first 4 byes (see C code below from >>> the UCSC browser team). Is there some pythonic way of overriding open? >> >> There is monkey patching (replace the current 'open' function with >> your modified version), but that is not a good idea in general. >> >> In any case, this would only affect the small number of Python >> tools which happen to use the Galaxy parsing libraries - which >> is a very small fraction of the tools in Galaxy. Most of the tools >> in Galaxy are compiled programs and are entirely separate. >> >> Peter >> ___ >> Please keep all replies on the list by using "reply all" >> in your mail client. To manage your subscriptions to this >> and other Galaxy lists, please use the interface at: >> http://lists.bx.psu.edu/ >> >> To search Galaxy mailing lists use the unified search at: >> http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
great. Let's put the bx-python calls in a galaxy_open helper function. On Jul 8, 2013, at 3:20 PM, James Taylor wrote: > open_compressed in bx-python does this already (for bz2 as well). > > On Jul 8, 2013, at 5:58 PM, Peter Cock wrote: > >> On Mon, Jul 8, 2013 at 10:24 PM, Robert Baertsch >> wrote: >>> Peter and Dan, >>> I like the idea of replacing all open() with galaxy_open() in all tools. You >>> can tell the format by looking at the first 4 byes (see C code below from >>> the UCSC browser team). Is there some pythonic way of overriding open? >> >> There is monkey patching (replace the current 'open' function with >> your modified version), but that is not a good idea in general. >> >> In any case, this would only affect the small number of Python >> tools which happen to use the Galaxy parsing libraries - which >> is a very small fraction of the tools in Galaxy. Most of the tools >> in Galaxy are compiled programs and are entirely separate. >> >> Peter >> ___ >> Please keep all replies on the list by using "reply all" >> in your mail client. To manage your subscriptions to this >> and other Galaxy lists, please use the interface at: >> http://lists.bx.psu.edu/ >> >> To search Galaxy mailing lists use the unified search at: >> http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
On Tue, Jul 9, 2013 at 5:53 PM, Robert Baertsch wrote: > On Jul 8, 2013, at 3:33 PM, Peter Cock wrote: >> The tools available in Galaxy are written in a range >> of languages including C, Perl, R, etc. Yes, some are in Python, >> but of those most are independent of Galaxy and can be used >> separately from Galaxy. > > the helper function would have to ported to R. We are talking > about how galaxy compressed data. Once we decide that, we > can determine how to best implement it. Individual tools called from Galaxy read and create the files - and we can't usually control them at this level (modifying them all to call a Galaxy managed file open mechanism is not an option). > Proposal: Do not treat compressed data as a separate data type. > Treat it as an independent attribute that can be applied to any data. > Otherwise you will have to create a gzipped , zip and bz2 type for > every type that you want to compress. That's what I've been saying - the fact that some people are already using a new gzipped FASTQ format within their Galaxy instances is practical, but I view it as a short term solution only. >>> Encoding the gzip status in the datatype will create an explosion of >>> datatypes. Compression is not actually a datatype, it tells you nothing >>> about the content data that is stored in the file. >> >> What we'd previously discussed was a dual system, holding >> the file type as now (e.g. FASTA, SAM, GFF3, etc) and any >> compression (e.g., None, normal GZIP, BGZF which is a >> GZIP variant, BZIP2, etc). > > What about tabular. Should we create tab.gz, tab.bz2 and tab.zip also? Note ZIP is a bit different, as it is often a multiple file bundle - it behaves differently from GZIP, BGZF, XY, BZIP2 etc in that regard. But otherwise, yes. As a specific example, the tabix tool used BGZF compressed tabular data to combine compression and efficient random access. This would be useful for many annotation files (e.g. GTF, GFF3). > This will quickly get out of hand and create a mess for tool > developers that need to support all thees types. Why? Individual tool developers don't need to know if Galaxy is keeping the original data file on disk compressed - unless the tool XML says otherwise, Galaxy would hide this detail and call the tool with an uncompressed input file. (Unix named pipe which decompresses the file on the file would be a potential alternative - but only if the tool XML was marked up to say that an input could be streamed. The default must be to assume potential random access to the input files) > The tool code and tool xml should be written to handle uncompressed > data and galaxy should handle the details of decompression. This > is not hard to do. It isn't trivial either ;) Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
I respectfully disagree, If you want an extensible system, you should always wrap primitive system level calls. Any tools that opens a file that could be compressed would be affected. That is a huge number of tools. Do you really want a cottage industry of tools that have different methods of dealing with compression? Encoding the gzip status in the datatype will create an explosion of datatypes. Compression is not actually a datatype, it tells you nothing about the content data that is stored in the file. It is up to the galaxy team to provide a standard way to interact with compressed files. My proposed solution, is a very small change that could be phased in over time. Any tools that uses open would not support compressed files, but they would not break on uncompressed files. Do others have an opinion? On Jul 8, 2013, at 2:58 PM, Peter Cock wrote: > On Mon, Jul 8, 2013 at 10:24 PM, Robert Baertsch > wrote: >> Peter and Dan, >> I like the idea of replacing all open() with galaxy_open() in all tools. You >> can tell the format by looking at the first 4 byes (see C code below from >> the UCSC browser team). Is there some pythonic way of overriding open? > > There is monkey patching (replace the current 'open' function with > your modified version), but that is not a good idea in general. > > In any case, this would only affect the small number of Python > tools which happen to use the Galaxy parsing libraries - which > is a very small fraction of the tools in Galaxy. Most of the tools > in Galaxy are compiled programs and are entirely separate. > > Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
Peter and Dan, I like the idea of replacing all open() with galaxy_open() in all tools. You can tell the format by looking at the first 4 byes (see C code below from the UCSC browser team). Is there some pythonic way of overriding open? You need to read the first four bytes of the file to see if it is compressed and call gzip.open inside of the function and pass pack the handle. For now, it would require a global sweep through the tools to change open() to galaxy_open(), but it is probably a good idea to have tool developers avoid calling open directly. You would have to have special handling if there are multiple files in the compressed archive but that support could be added later. -Robert def galaxy_open(filename, mode="r"): compressor = getCompressor(filename, mode) if compessor != NULL: return openCompressed(filename, mode, compressor) else: return open(filename, mode) def openCompressed(filename, mode): 4bytes = read4bytes(filename) ext = getExtensionFromHdrSig(4bytes) if ext == "gz" : return gzip.open(filename, mode) else if ext == "bz2": return bz2.BZ2File(filename, mode) else if ext == "zip": return zipfile.ZipFile(filename, mode) char *getExtensionFromHdrSig(char *first4bytes) /* Check if header has signature of supported compression stream, and return a phoney filename with extension for it, or NULL if no sig found. */ { char buf[20]; char *ext=NULL; if (startsWith("\x1f\x8b",first4bytes)) ext = "gz"; else if (startsWith("\x1f\x9d\x90",first4bytes)) ext = "Z"; else if (startsWith("BZ",first4bytes)) ext = "bz2"; else if (startsWith("PK\x03\x04",first4bytes)) ext = "zip"; if (ext==NULL) return NULL; } On Jul 8, 2013, at 4:05 AM, Peter Cock wrote: > On Thu, Jul 4, 2013 at 9:49 PM, Robert Baertsch > wrote: >> Dan, >> Do these readers support gzip files? >> >> reader = fastqVerboseErrorReader >>reader = fastqReader > > Presumably you are writing a Python script using this library? > The answer is a qualified yes. Instead of passing them a normal > file handle using open("example.fastq") you instead use > gzip.open("example.fastq") via import gzip. > >> Do I have to define a special type in galaxy for gzipped files or will the >> fastq type be ok? >> > > This needs a special file format - but you are not the first person to > look at this, some groups have defined custom gzipped variants of > the FASTQ formats within their own Galaxy instances. I've not > done this but there should be some useful emails in the archive. > > Note you'd also need to modify any tool definitions to that they > can accept a gzipped FASTQ file. > >> Ideally, I would like to keep my files zipped and not have galaxy unzip >> them, since they triple in size when unzipped. >> >> I'm happy to do a push request if you don't support this but I want to make >> sure I'm in line with your roadmap. > > Personally I would like a more general system in Galaxy for > potentially any file type to be held compressed in a range of > formats (e.g. using gzip, bgzf, xy, bz2, etc), with exclusions > for things like BAM which are already compressed. This way > naive tools would get the gzipped file file uncompressed to a > temporary folder before use (i.e. no change for the tool wrapper), > but if a tool accepts a gzipped file it will get that (less disk IO > and CPU usage, but requires updating tool wrappers). > > That idea is quite ambitious through ;) > >> I have written a simple tool to convert Illumina fastq to mapsplice fastq. >> Does that already exist already somewhere? >> > > I don't know. > > Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
open_compressed in bx-python does this already (for bz2 as well). On Jul 8, 2013, at 5:58 PM, Peter Cock wrote: > On Mon, Jul 8, 2013 at 10:24 PM, Robert Baertsch > wrote: >> Peter and Dan, >> I like the idea of replacing all open() with galaxy_open() in all tools. You >> can tell the format by looking at the first 4 byes (see C code below from >> the UCSC browser team). Is there some pythonic way of overriding open? > > There is monkey patching (replace the current 'open' function with > your modified version), but that is not a good idea in general. > > In any case, this would only affect the small number of Python > tools which happen to use the Galaxy parsing libraries - which > is a very small fraction of the tools in Galaxy. Most of the tools > in Galaxy are compiled programs and are entirely separate. > > Peter > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > http://lists.bx.psu.edu/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
On Mon, Jul 8, 2013 at 11:21 PM, Robert Baertsch wrote: > I respectfully disagree, If you want an extensible system, you should > always wrap primitive system level calls. > > Any tools that opens a file that could be compressed would be affected. > That is a huge number of tools. Do you really want a cottage industry of > tools that have different methods of dealing with compression? But defining a Python helper function within the Galaxy Python libraries doesn't achieve that. Are you talking about patching the OS level POSIX open functions or something? The tools available in Galaxy are written in a range of languages including C, Perl, R, etc. Yes, some are in Python, but of those most are independent of Galaxy and can be used separately from Galaxy. > Encoding the gzip status in the datatype will create an explosion of > datatypes. Compression is not actually a datatype, it tells you nothing > about the content data that is stored in the file. What we'd previously discussed was a dual system, holding the file type as now (e.g. FASTA, SAM, GFF3, etc) and any compression (e.g., None, normal GZIP, BGZF which is a GZIP variant, BZIP2, etc). Galaxy tool wrappers currently define input files with a list of file types - they'd also have to give a list of supported compression types (defaulting to none). Likewise for any output files - if they are already compressed the XML for the tool wrapper would have to tell Galaxy this. > It is up to the galaxy team to provide a standard way to interact > with compressed files. That is my preference too - although this could be driven by the Galaxy community rather than the core team? I see defining new datatypes like 'gzippedfastq' as a stop gap special case (but a very practical route for now). > My proposed solution, is a very small change that could > be phased in over time. Any tools that uses open would not support > compressed files, but they would not break on uncompressed files. > > Do others have an opinion? Either I don't understand your plan, or it would only help in a tiny minority of cases. Regards, Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
On Mon, Jul 8, 2013 at 10:24 PM, Robert Baertsch wrote: > Peter and Dan, > I like the idea of replacing all open() with galaxy_open() in all tools. You > can tell the format by looking at the first 4 byes (see C code below from > the UCSC browser team). Is there some pythonic way of overriding open? There is monkey patching (replace the current 'open' function with your modified version), but that is not a good idea in general. In any case, this would only affect the small number of Python tools which happen to use the Galaxy parsing libraries - which is a very small fraction of the tools in Galaxy. Most of the tools in Galaxy are compiled programs and are entirely separate. Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] gzipped fastq reader
On Thu, Jul 4, 2013 at 9:49 PM, Robert Baertsch wrote: > Dan, > Do these readers support gzip files? > >reader = fastqVerboseErrorReader > reader = fastqReader Presumably you are writing a Python script using this library? The answer is a qualified yes. Instead of passing them a normal file handle using open("example.fastq") you instead use gzip.open("example.fastq") via import gzip. > Do I have to define a special type in galaxy for gzipped files or will the > fastq type be ok? > This needs a special file format - but you are not the first person to look at this, some groups have defined custom gzipped variants of the FASTQ formats within their own Galaxy instances. I've not done this but there should be some useful emails in the archive. Note you'd also need to modify any tool definitions to that they can accept a gzipped FASTQ file. > Ideally, I would like to keep my files zipped and not have galaxy unzip them, > since they triple in size when unzipped. > > I'm happy to do a push request if you don't support this but I want to make > sure I'm in line with your roadmap. Personally I would like a more general system in Galaxy for potentially any file type to be held compressed in a range of formats (e.g. using gzip, bgzf, xy, bz2, etc), with exclusions for things like BAM which are already compressed. This way naive tools would get the gzipped file file uncompressed to a temporary folder before use (i.e. no change for the tool wrapper), but if a tool accepts a gzipped file it will get that (less disk IO and CPU usage, but requires updating tool wrappers). That idea is quite ambitious through ;) > I have written a simple tool to convert Illumina fastq to mapsplice fastq. > Does that already exist already somewhere? > I don't know. Peter ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/