Organised mirroring of public databases

Steffen Moeller Fri, 18 Jan 2008 02:27:37 -0800

Dear all,

I parked here


http://svn.debian.org/wsvn/debian-med/trunk/community/infrastructure/getData.pl?op=file&rev=0&sc=0

a script which allows the download of external databases in a fairly
straight-forward manner. This is fairly far from perfect but may help to
get ourselves organised towards that said shared aim.

The tool should be extended to allow
 * the addition of databases locally (but hey, since we are on svn and
the databases mostly public, there should not be much of a need to add
databases for oneself only)
 * versioning of databases. Most sites feature past releases for a while
which should be modelled properly.
 * formally specify subsets of the databases, like only mammalian or
human data, if offered as such by upstream maintainers.

We should not (immediately) think of
 * the specification of local mirrors of some public site
 * disk space issues
 * dependencies between downloaded datasets, e.g., the automated rewrite
of EMBL format to FASTA, since such are available online as well. This
would induce ambiguities and possibly also increase utilised bandwith.

So, what database should we address first? The small ones, so I suggest.

Best regards

Steffen


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Organised mirroring of public databases

Reply via email to