[galaxy-dev] Re: How to include a large datafile in a bioconda package?
That seems a good compromise within conda, since BioConda wouldn't want the binary package itself to be too big. (I'm doing something similar with some real sample data for a tool, putting it up on Zenodo. Of course, this is optional for my tool - your use case is different.) The Galaxy Data Manager route seems more appropriate if there is a choice of large data files which could be used with the tool (not just one). Peter On Wed, Jul 24, 2019 at 10:26 PM Björn Grüning wrote: > > Hi Jin, > > you can use a post-link script in conda. > > Like here: > https://github.com/bioconda/bioconda-recipes/blob/master/recipes/picrust2/post-link.sh > > This way the data can be fetch during tool installation. > > See more information here: > https://docs.conda.io/projects/conda-build/en/latest/resources/link-scripts.html > > Ciao, > Bjoern > > Am 24.07.19 um 18:43 schrieb Jin Li: > > Hi Brad, > > > > Thank you for your quick reply. I can put the data file to Zenodo so > > that I will have a permanent location for it. > > > > As for re-computing the data file locally, it may need several days to > > run, so it may be quite inefficient to do the computing. I am > > expecting an automatic download of the data file when installing the > > package. Do we have a convention to do that? Thank you. > > > > Best regards, > > Jin > > > > On Wed, Jul 24, 2019 at 11:31 AM Langhorst, Brad wrote: > >> > >> Hi: > >> > >> I’d be concerned about that file changing or disappearing and causing > >> irreproducibility. > >> If the URL were to a permanent location (e.g. NCBI or zenodo) maybe it’s > >> ok. > >> > >> Could it be re-computed locally if necessary (like a genome index)? > >> > >> Maybe others know of examples where this is done. > >> > >> > >> Brad > >> > >> On Jul 24, 2019, at 12:24 PM, Jin Li wrote: > >> > >> Hi all, > >> > >> I am not sure if this mailing list is a good place to ask a bioconda > >> question. Sorry to bother if not. I want to ask how to include a large > >> data file when publishing a bioconda package. Our program depends on a > >> pre-computed data file, which is too large to be included in the > >> source code package. The data file can be accessed via a public URL. > >> Can I put the downloading command in `build.sh` when publishing a > >> bioconda package? If not, do we have a convention to deal with > >> dependent large datafiles? Thank you. > >> > >> Best regards, > >> Jin > >> ___ > >> Please keep all replies on the list by using "reply all" > >> in your mail client. To manage your subscriptions to this > >> and other Galaxy lists, please use the interface at: > >> %(web_page_url)s > >> > >> To search Galaxy mailing lists use the unified search at: > >> http://galaxyproject.org/search/ > >> > >> > >> Bradley W. Langhorst, Ph.D. > >> Development Group Leader > >> New England Biolabs > >> > >> > >> > > ___ > > Please keep all replies on the list by using "reply all" > > in your mail client. To manage your subscriptions to this > > and other Galaxy lists, please use the interface at: > >%(web_page_url)s > > > > To search Galaxy mailing lists use the unified search at: > >http://galaxyproject.org/search/ > > > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > %(web_page_url)s > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
[galaxy-dev] Re: How to include a large datafile in a bioconda package?
Hi Bjoern, Thank you for your direction and information. The post-link script is exactly what I was looking for. I am glad I asked the question here. Thank you. Best regards, Jin On Wed, Jul 24, 2019 at 3:26 PM Björn Grüning wrote: > > Hi Jin, > > you can use a post-link script in conda. > > Like here: > https://github.com/bioconda/bioconda-recipes/blob/master/recipes/picrust2/post-link.sh > > This way the data can be fetch during tool installation. > > See more information here: > https://docs.conda.io/projects/conda-build/en/latest/resources/link-scripts.html > > Ciao, > Bjoern > > Am 24.07.19 um 18:43 schrieb Jin Li: > > Hi Brad, > > > > Thank you for your quick reply. I can put the data file to Zenodo so > > that I will have a permanent location for it. > > > > As for re-computing the data file locally, it may need several days to > > run, so it may be quite inefficient to do the computing. I am > > expecting an automatic download of the data file when installing the > > package. Do we have a convention to do that? Thank you. > > > > Best regards, > > Jin > > > > On Wed, Jul 24, 2019 at 11:31 AM Langhorst, Brad wrote: > >> > >> Hi: > >> > >> I’d be concerned about that file changing or disappearing and causing > >> irreproducibility. > >> If the URL were to a permanent location (e.g. NCBI or zenodo) maybe it’s > >> ok. > >> > >> Could it be re-computed locally if necessary (like a genome index)? > >> > >> Maybe others know of examples where this is done. > >> > >> > >> Brad > >> > >> On Jul 24, 2019, at 12:24 PM, Jin Li wrote: > >> > >> Hi all, > >> > >> I am not sure if this mailing list is a good place to ask a bioconda > >> question. Sorry to bother if not. I want to ask how to include a large > >> data file when publishing a bioconda package. Our program depends on a > >> pre-computed data file, which is too large to be included in the > >> source code package. The data file can be accessed via a public URL. > >> Can I put the downloading command in `build.sh` when publishing a > >> bioconda package? If not, do we have a convention to deal with > >> dependent large datafiles? Thank you. > >> > >> Best regards, > >> Jin > >> ___ > >> Please keep all replies on the list by using "reply all" > >> in your mail client. To manage your subscriptions to this > >> and other Galaxy lists, please use the interface at: > >> %(web_page_url)s > >> > >> To search Galaxy mailing lists use the unified search at: > >> http://galaxyproject.org/search/ > >> > >> > >> Bradley W. Langhorst, Ph.D. > >> Development Group Leader > >> New England Biolabs > >> > >> > >> > > ___ > > Please keep all replies on the list by using "reply all" > > in your mail client. To manage your subscriptions to this > > and other Galaxy lists, please use the interface at: > >%(web_page_url)s > > > > To search Galaxy mailing lists use the unified search at: > >http://galaxyproject.org/search/ > > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
[galaxy-dev] Re: How to include a large datafile in a bioconda package?
Hi Jin, you can use a post-link script in conda. Like here: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/picrust2/post-link.sh This way the data can be fetch during tool installation. See more information here: https://docs.conda.io/projects/conda-build/en/latest/resources/link-scripts.html Ciao, Bjoern Am 24.07.19 um 18:43 schrieb Jin Li: Hi Brad, Thank you for your quick reply. I can put the data file to Zenodo so that I will have a permanent location for it. As for re-computing the data file locally, it may need several days to run, so it may be quite inefficient to do the computing. I am expecting an automatic download of the data file when installing the package. Do we have a convention to do that? Thank you. Best regards, Jin On Wed, Jul 24, 2019 at 11:31 AM Langhorst, Brad wrote: Hi: I’d be concerned about that file changing or disappearing and causing irreproducibility. If the URL were to a permanent location (e.g. NCBI or zenodo) maybe it’s ok. Could it be re-computed locally if necessary (like a genome index)? Maybe others know of examples where this is done. Brad On Jul 24, 2019, at 12:24 PM, Jin Li wrote: Hi all, I am not sure if this mailing list is a good place to ask a bioconda question. Sorry to bother if not. I want to ask how to include a large data file when publishing a bioconda package. Our program depends on a pre-computed data file, which is too large to be included in the source code package. The data file can be accessed via a public URL. Can I put the downloading command in `build.sh` when publishing a bioconda package? If not, do we have a convention to deal with dependent large datafiles? Thank you. Best regards, Jin ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/ Bradley W. Langhorst, Ph.D. Development Group Leader New England Biolabs ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
[galaxy-dev] Re: [External] Re: How to include a large datafile in a bioconda package?
When developing tools for CTAT, we used a DataManager to do this sort of thing. So the Admin has to download both the tool and the DataManager, and use the DataManager to download the large file and put it in the desired location on the system. Cicada Dennis From: Jin Li Sent: Wednesday, July 24, 2019 12:43 PM To: Langhorst, Brad Cc: Galaxy Dev List Subject: [External] [galaxy-dev] Re: How to include a large datafile in a bioconda package? This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources. --- Hi Brad, Thank you for your quick reply. I can put the data file to Zenodo so that I will have a permanent location for it. As for re-computing the data file locally, it may need several days to run, so it may be quite inefficient to do the computing. I am expecting an automatic download of the data file when installing the package. Do we have a convention to do that? Thank you. Best regards, Jin On Wed, Jul 24, 2019 at 11:31 AM Langhorst, Brad wrote: > > Hi: > > I’d be concerned about that file changing or disappearing and causing > irreproducibility. > If the URL were to a permanent location (e.g. NCBI or zenodo) maybe it’s ok. > > Could it be re-computed locally if necessary (like a genome index)? > > Maybe others know of examples where this is done. > > > Brad > > On Jul 24, 2019, at 12:24 PM, Jin Li wrote: > > Hi all, > > I am not sure if this mailing list is a good place to ask a bioconda > question. Sorry to bother if not. I want to ask how to include a large > data file when publishing a bioconda package. Our program depends on a > pre-computed data file, which is too large to be included in the > source code package. The data file can be accessed via a public URL. > Can I put the downloading command in `build.sh` when publishing a > bioconda package? If not, do we have a convention to deal with > dependent large datafiles? Thank you. > > Best regards, > Jin > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > %(web_page_url)s > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/ > > > Bradley W. Langhorst, Ph.D. > Development Group Leader > New England Biolabs > > > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
[galaxy-dev] Re: How to include a large datafile in a bioconda package?
Hi Brad, Thank you for your quick reply. I can put the data file to Zenodo so that I will have a permanent location for it. As for re-computing the data file locally, it may need several days to run, so it may be quite inefficient to do the computing. I am expecting an automatic download of the data file when installing the package. Do we have a convention to do that? Thank you. Best regards, Jin On Wed, Jul 24, 2019 at 11:31 AM Langhorst, Brad wrote: > > Hi: > > I’d be concerned about that file changing or disappearing and causing > irreproducibility. > If the URL were to a permanent location (e.g. NCBI or zenodo) maybe it’s ok. > > Could it be re-computed locally if necessary (like a genome index)? > > Maybe others know of examples where this is done. > > > Brad > > On Jul 24, 2019, at 12:24 PM, Jin Li wrote: > > Hi all, > > I am not sure if this mailing list is a good place to ask a bioconda > question. Sorry to bother if not. I want to ask how to include a large > data file when publishing a bioconda package. Our program depends on a > pre-computed data file, which is too large to be included in the > source code package. The data file can be accessed via a public URL. > Can I put the downloading command in `build.sh` when publishing a > bioconda package? If not, do we have a convention to deal with > dependent large datafiles? Thank you. > > Best regards, > Jin > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > %(web_page_url)s > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/ > > > Bradley W. Langhorst, Ph.D. > Development Group Leader > New England Biolabs > > > ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
[galaxy-dev] Re: How to include a large datafile in a bioconda package?
Hi: I’d be concerned about that file changing or disappearing and causing irreproducibility. If the URL were to a permanent location (e.g. NCBI or zenodo) maybe it’s ok. Could it be re-computed locally if necessary (like a genome index)? Maybe others know of examples where this is done. Brad On Jul 24, 2019, at 12:24 PM, Jin Li mailto:lijin@gmail.com>> wrote: Hi all, I am not sure if this mailing list is a good place to ask a bioconda question. Sorry to bother if not. I want to ask how to include a large data file when publishing a bioconda package. Our program depends on a pre-computed data file, which is too large to be included in the source code package. The data file can be accessed via a public URL. Can I put the downloading command in `build.sh` when publishing a bioconda package? If not, do we have a convention to deal with dependent large datafiles? Thank you. Best regards, Jin ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/ Bradley W. Langhorst, Ph.D. Development Group Leader New England Biolabs ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
[galaxy-dev] How to include a large datafile in a bioconda package?
Hi all, I am not sure if this mailing list is a good place to ask a bioconda question. Sorry to bother if not. I want to ask how to include a large data file when publishing a bioconda package. Our program depends on a pre-computed data file, which is too large to be included in the source code package. The data file can be accessed via a public URL. Can I put the downloading command in `build.sh` when publishing a bioconda package? If not, do we have a convention to deal with dependent large datafiles? Thank you. Best regards, Jin ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/