Re: [HCP-Users] (files) listing for file bundles

Yaroslav Halchenko Tue, 06 Dec 2016 11:48:53 -0800

On Tue, 06 Dec 2016, Elam, Jennifer wrote:
>    A listing of the by subject unpacked files available, organized by
>    modality and processing level, are available in Appendix 3 of the
>    Reference Manual.

>    The files are listed there as they unpack into a standard directory
>    structure. They are not organized by ConnectomeDB packages, per se,
>    because the listing is to be also applicable to users of Connectome in a
>    Box and Amazon S3. If you really need a listing of the package contents
>    themselves, we (Mike Hodge) can provide that separately.

On Tue, 06 Dec 2016, Hodge, Michael wrote:
> Yaroslav,

> Separate packages are created for each subject.  The list I sent just listed 
> packages for a couple of subjects to show you the files contained in the 
> packages by example.  There aren't packages that correspond to the unrelated 
> groups.  Each subject in the groups has a set of packages.  I could repeat 
> the unzip search across all subjects if you wish, but it would be a very 
> large file.

Dear Jennifer and Michael,

Thank you for your replies!

Let me may be describe my target use-case and why I was asking about
packages, which may be would make situation a bit clearer.

s3 HCP bucket provides convenient access to the dataset's individual files
but they lack annotation on what package(s) (as shipped from db.) any
particular file possibly belongs to.  But such "packaging" is important
meta-information since many folks analyze data from a particular "package".

In datalad project we would like to provide access to data from HCP bucket, but
also would like  to allow users to specify "packages" -- as to which specific
sub-datasets (e.g. not all subjects when not all subjects belong to a
package) to install and which files to download.  So it would look like
following if we assume that 7T_MOVIE_2mm_preproc  is a name of an example
package which contains a subset of subjects with 7T movie "task" data. 

        datalad search 7T_MOVIE_2mm_preproc | xargs datalad install

to install those subjects' datasets (git-annex repositories without actual data
by default), and then (hypothetical API)

        datalad get -r --annex-meta 7T_MOVIE_2mm_preproc 

to actually fetch data files present in the  7T_MOVIE_2mm_preproc  package.

Similarly, they could later run 

Since, I guess, you are composing those "packages" somehow already from a list
of rules/files, I just thought that may be those could be shared, so we could
embed that information in our annex HCP repositories and to not incur any
additional "development/setup/maintenance cost" (as to dumping listing of
generated .zip files).  Then, if just plain .txt files with listings (unlike
formatted pdfs -- easily machine readable), then people could also easily
come up with their 1 line shell scripts to fetch corresponding to packages
files from s3.

So -- overall -- listings produced by Michael would work but I wondered if we
could avoid (re)creating them and possibly make them even better for
machine-parsing (e.g. one .txt file per each package which would include paths
for files for all the subjects in that package).

BTW --   7T_MOVIE_2mm_preproc   set of files is not yet on S3 bucket.  When
will that portion be uploaded?

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        
_______________________________________________
HCP-Users mailing list
HCP-Users@humanconnectome.org
http://lists.humanconnectome.org/mailman/listinfo/hcp-users

Re: [HCP-Users] (files) listing for file bundles

Reply via email to