A converted dataset would be fine too.

I'm working on an enhancement that would allow the metadata to be provided when 
the file is uploaded/registered via the API. So to do what you say, I'd need to 
have a way of providing that converted dataset.

The files I'm talking about are concatenated GZIP files, and the GZIP format 
specification doesn't contain any information about the size of the compressed 
data, only the uncompressed size (and then, modulo 2^32). AFAIK, anything in 
Galaxy that tried to create the auxiliary index would need to read and 
decompress all the data in the file to do that - easily an hours' worth of work 
for some of our full genome runs. We have all that information already when we 
make the file, so I'd prefer to just give it to Galaxy at the start. I could 
place stuff in a special section in the first GZIP header, but then this 
capability would not be as general-purpose as it could be.

I also want to prevent unnecessary gzip decompression in python, because 
serious decompression in versions before 2.7 is so slow as to be unusable for 
large datasets.

Is there a way to upload that converted dataset when I upload/register the main 
file? I'd also need to know how to write such a file.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-----Original Message-----
From: James Taylor [mailto:ja...@jamestaylor.org] 
Sent: Friday, August 26, 2011 5:37 AM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Storing a dict as metadata

Hey John, are you sure you don't want to use a "converted dataset" rather than 
a metadata element for this. This is how we handle most types of secondary 
indexes for visualization. 

If you do it this way, the converter that creates the offset index is just 
another tool (but registered in datatypes_conf.xml) and the index it self is 
another dataset that can be accessed through the converted datasets 
relationship. 

On Aug 25, 2011, at 6:12 PM, Duddy, John wrote:

> I'd like to have a datatype with a dict as metadata. This dict() would store 
> file offsets to enable seeking around to process different sections of the 
> file.
>  
> How do I add a dictionary data metadata element?
>  
>  
>  
> John Duddy
> Sr. Staff Software Engineer
> Illumina, Inc.
> 9885 Towne Centre Drive
> San Diego, CA 92121
> Tel: 858-736-3584
> E-mail: jdu...@illumina.com
>  
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to