Re: [EMBOSS] [Open-bio-l] Common Sample Data Collection, was: SCF files (Staden)
On 11/30/2011 11:32 AM, Pjotr Prins wrote: Git is not very good for storing large data files, which we would want to fetch partially. My suggestion would be to have a plain old file repo, e.g. on S3, which can be mirrored by others. We had issues with large files in the EMBOSS release, and make those available via rsync to add to the developers CVS checkout. They include the NCBI taxonomy source and index files and the ontology source and index files. The next EMBOSS release will include http and ftp URLs as valid inputs for any data type, so EMBOSS could use remote files for format tests. I' look into how other repositories could be added. I had to add some extra qualifiers to allow queries and offsets to be specified, and rewrote the query language parsing to merge very similar code segments. regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] [Open-bio-l] Common Sample Data Collection, was: SCF files (Staden)
On Wed, Nov 30, 2011 at 11:38 AM, Peter Rice p...@ebi.ac.uk wrote: On 11/30/2011 11:32 AM, Pjotr Prins wrote: Git is not very good for storing large data files, which we would want to fetch partially. My suggestion would be to have a plain old file repo, e.g. on S3, which can be mirrored by others. We had issues with large files in the EMBOSS release, and make those available via rsync to add to the developers CVS checkout. They include the NCBI taxonomy source and index files and the ontology source and index files. The next EMBOSS release will include http and ftp URLs as valid inputs for any data type, so EMBOSS could use remote files for format tests. I' look into how other repositories could be added. I had to add some extra qualifiers to allow queries and offsets to be specified, and rewrote the query language parsing to merge very similar code segments. regards, Peter Rice EMBOSS Team How about an OBF hosted FTP site then if we want big data? I guess we'd mostly be adding files, and changes/deletions should be rare, so a full version tracking repository isn't essential if we are disciplined about updating README files or more formal meta data. Peter ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] [Open-bio-l] Common Sample Data Collection, was: SCF files (Staden)
On Wed, Nov 30, 2011 at 11:45 AM, Pjotr Prins pjotr.publi...@thebird.nl wrote: On Wed, Nov 30, 2011 at 11:42:22AM +, Peter Cock wrote: How about an OBF hosted FTP site then if we want big data? Yes :) I guess we'd mostly be adding files, and changes/deletions should be rare, so a full version tracking repository isn't essential if we are disciplined about updating README files or more formal meta data. We can still have the readme's and MD5s mirrored in a small repo. That would track changes/moving/renaming. Pj. True, or even a hybrid where small files also live in a git repo, but for larger files we just store the URL and MD5? Peter ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] [Open-bio-l] Common Sample Data Collection, was: SCF files (Staden)
On Nov 30, 2011, at 5:58 AM, Peter Cock wrote: On Wed, Nov 30, 2011 at 11:45 AM, Pjotr Prins pjotr.publi...@thebird.nl wrote: On Wed, Nov 30, 2011 at 11:42:22AM +, Peter Cock wrote: How about an OBF hosted FTP site then if we want big data? Yes :) I guess we'd mostly be adding files, and changes/deletions should be rare, so a full version tracking repository isn't essential if we are disciplined about updating README files or more formal meta data. We can still have the readme's and MD5s mirrored in a small repo. That would track changes/moving/renaming. Pj. True, or even a hybrid where small files also live in a git repo, but for larger files we just store the URL and MD5? Peter There was an initial push for this years ago IIRC, with the biodata repository, but it never took off. Not sure if the dev.open-bio.org CVS repo is even browsable anymore (I believe this was all synced to portal for browsing), but the old biodata CVS repo is still in /home/repositories/biodata (very little there, might as well start from scratch). chris ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss