Hi John,
Yes, I think this should work as I have seen it work for another binary type I
made before. See below:
class FileSet( Binary ):
"""FileSet containing N files"""
file_ext = "prims.fileset.zip"
blurb = "(zipped) FileSet containing multiple files"
def sniff( self, filename ):
# If the zip file contains multiple files then return true, false
otherwise:
zf = zipfile.ZipFile(filename)
if (len(zf.infolist())>1):
return True
else :
return False
# the if is just for backwards compatibility...could remove this at some point
if hasattr(Binary, 'register_sniffable_binary_format'):
Binary.register_sniffable_binary_format('FileSet', 'prims.fileset.zip',
FileSet)
Now the question I have is: what would be a good logic to use in the sniff
method? I need something that uniquely distinguishes this zipped file from
other zip files, right? In the previous example above I found a solution by
checking whether the zip file has multiple files inside and return true if this
is the case. Now with RData, does it mean I have to try to parse the binary
contents inside and come with a good heuristic/rule ? Just wondering if someone
already has thought about such a rule, specifically for RData.
Thanks,
Pieter.
-----Original Message-----
From: John Chilton [mailto:[email protected]]
Sent: donderdag 23 oktober 2014 3:02
To: Lukasse, Pieter
Cc: [email protected]
Subject: Re: [galaxy-dev] strange issue with .RData files
Hey Pieter,
Sorry I am swamped right now so I don't have time to dig into this in detail
- but I have encountered this before with datatypes that are compressed -
zipped, gzipped, etc.... Galaxy will attempt to decompress them in order to
figure out what they are. I believe this is what is happening to your data. If
you register the type as a sniffable binary it looks like it should skip the
decompression though
- unless I am reading this logic wrong in tools/data_source/upload.py
https://gist.github.com/jmchilton/54b5d7485fcd16eec984.
E.g. like bam datatypes:
class Bam( Binary ):
....
Binary.register_sniffable_binary_format("bam", "bam", Bam)
Have you registered a sniffable binary datatype for RData?
-John
On Wed, Oct 22, 2014 at 9:38 AM, Lukasse, Pieter <[email protected]> wrote:
> Hi,
>
>
>
> When I upload any .RData file to my Galaxy server it seems to be
> unpacking/changing it. The resulting file in my history is different
> and around 2x larger than the uploaded file. The tool that needs to
> use it also aborts with an error due to this erroneous file.
>
>
>
> What are the workarounds?
>
>
>
> Thanks,
>
>
>
> Pieter Lukasse
>
> Wageningen UR, Plant Research International
>
> Department of Bioinformatics (Bioscience)
>
> Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB,
> Wageningen, the Netherlands
>
> T: +31-317481122;
> M: +31-628189540;
> skype: pieter.lukasse.wur
>
> http://www.pri.wur.nl
>
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this and other
> Galaxy lists, please use the interface at:
> http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
> http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/