Re: RDF data archives

David Booth Thu, 05 Dec 2013 22:21:51 -0800

Hi Michel,

On 12/05/2013 07:05 PM, Michel Dumontier wrote:

Hi all,
  As you may know, Bio2RDF produces RDF dumps of its RDF datasets [1,2].
For each dataset, we generate a dataset description file (as per [3];
example [4]) that is in n-triples format, while the dataset is comprised
of one or more *gzipped* n-triple files. I just noticed that LODStats
did not correctly parse [5] these files to generate the dataset
statistics, owing, perhaps, to the assignment of
"application/x-ntriples" in the relevant datahub.io <http://datahub.io>
resource metadata.
I'd like to know what mime type we should specify for zipped, gzipped
RDF data.

If you assume that the recipient will want to unzip them before parsing(as opposed to parsing *while* unzipping) then you could use a normalRDF MIME type but specify a gzip HTTP Content-Encoding:

http://stackoverflow.com/questions/864448/how-to-set-content-encoding-with-gzip

David


as we prepare for our next release, we're planning to generate n-quads
for the datasets, thereby linking versioned datasets with their
metadata. we are wondering whether there will be sufficient support for
this format. Also, we are wondering whether it would be problematic to
provide single file downloads that are tar.gz  formatted.

comments and suggestions most welcome,

m.


[1] http://bio2rdf.org/datasets
[2] http://download.bio2rdf.org/
[3]
https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-Dataset-Provenance
[4]
http://download.bio2rdf.org/current/affymetrix/bio2rdf-affymetrix-20121004.nt
[5] http://stats.lod2.eu/rdfdocs?search=bio2rdf

--
Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford
University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Re: RDF data archives

Reply via email to