AW: [pedantic-web] ANN: 20th Century Press Archives as ORE / Linked Data application - Technical Preview

Neubert Joachim Tue, 29 Dec 2009 14:28:52 -0800

Hi Ed,
 
It's great to see you exploring our data with your own tools (and thanks for 
releasing them - I will happily add them to my toolset!). 
 
Your type counts seem right (336+185+2+2+1=526) - everything is an 
ore:Aggregation (defined by a ore:ResourceMap) and, except the top Aggregation, 
has another type attached, in order to be able to handle them differently for 
purposes like web page generation.
 
You spotted correctly that the homegrown types are not yet defined and need 
much more work. The skos:prefLabel is not intended to mean that PmPersonFolder 
is a specialisation of skos:Concept - it was used to indicate that this is an 
*unique* label (which could be used e.g. within a list of search results) - 
sorry for causing this misunderstanding. For our STW project 
(http://zbw.eu/stw) I started with a few extensions to SKOS and named the vocab 
file "skos-extensions" accordingly. Now I'm trapped, and hesitate to really add 
dcterms- or bibo-extensions to this file. However, I also don't want to add 
"dcterms-extensions", "bibo-extensions", etc. etc. files. And neither all these 
rather trivial and highly custom extensions do constitute a cohesive vocabulary 
on it's own. 
 
Irrespective of this self-introduced hiccup I was searching for some RDF types, 
and didn't find yet something which fits well for
 
- Folders: The physical archives consists of folders (which contain sheets of 
paper with affixed articles). Since 
folders are also in broad use in file archives, historical records etc., I 
think about suggesting the bibo guys to introduce a bibo:Folder subclass of 
bibo:Collection. PmPersonFolder, PmCompanyFolder etc. could be smoothly derived 
from such a superclass. 
 
- Pages: I'm sure that somebody has coined a RDF type for pages, but wasn't 
able to figure out where. I assume you had a similar problem at 
chroniclingamerica.gov (even though a whole newspaper page is not exactly the 
same as (a part of) an article glued onto a sheet of paper). How did you solve 
it?
 
The dct:subject links to DNB authority file are quite preliminary, and I will 
gladly substitute them when DNB publishes it's LOD version of the authority 
files. 
 
Of cause it's OK for you to experiment also with the JPEGs (which have been on 
the web for years already). Without them, it's only half of the fun ;) We have 
not figured out, however, under which license conditions this data could be 
re-used. German law ("Urheberrecht") requires an OK by the original author or 
her legal heirs until 70 years after her death. This is almost impossible for 
most of the newspaper articles (which often were published without any 
denomination of an author at all). So I'm not sure which kind of license could 
be granted to a third party, and lack of legal security may even prohibit the 
replication of the JPEGs in a LOCKSS scenario. If however the data could be 
usefull for demonstrating what can be achieved with ORE harvesting, I would be 
really happy.
 
Thanks again for all your comments, encouragement, and - most exciting - using 
the data. That's the idea of the tribe and the whole linked data community!
 
Cheers, Joachim

________________________________

Von: oai-...@googlegroups.com im Auftrag von Ed Summers
Gesendet: Mo 28.12.2009 19:29
An: pedantic-...@googlegroups.com
Cc: public-lod@w3.org; oai-...@googlegroups.com
Betreff: Re: [pedantic-web] ANN: 20th Century Press Archives as ORE / Linked 
Data application - Technical Preview

On Mon, Dec 28, 2009 at 8:07 AM, Neubert Joachim <j.neub...@zbw.eu> wrote:
> Please feel invited to take a look at it - we would highly appreciate any
> feedback about our approach.

Thanks for announcing this Joachim. It is great to see more linked
data as rdfa getting out on the web. I'm particularly excited because
of your use of the oai-ore vocabulary to make historic newspaper
archives available, since we are doing something similar at the
Library of Congress [1].

You must've done something right because I just wrote a little naive
crawler [2] in a matter of minutes to pull down what looks like all
the rdfa you've put out there so far. It seem to have collected about
11,427 triples [3]. My rdfsum unix command line hack [4] came up with
these rdf:type counts:

   1533 <http://www.openarchives.org/ore/terms/AggregatedResource>
    526 <http://www.openarchives.org/ore/terms/ResourceMap>
    526 <http://www.openarchives.org/ore/terms/Aggregation>
    336 <http://zbw.eu/namespaces/skos-extensions/PmPage>
    185 <http://purl.org/ontology/bibo/Article>
      2 <http://zbw.eu/namespaces/skos-extensions/PmPersonFolder>
      2 <http://zbw.eu/namespaces/skos-extensions/PmCollection>

Does that sound about right for this initial release?

I noticed that you have chosen to link to names in the German National
Authority file like:

  <http://zbw.eu/beta/pm20/person/00012> dct:subject
<http://d-nb.info/gnd/118646419> .

I seem to remember hearing at SWIB09 [5] that the Deutsche National
Bibliothek was thinking about minting URIs for entries in the
authority file that follow Linked Data best practices (hash or 303,
etc). Were you planning on modifying these appropriately when those
URLs became available? Right now the d-nb URL returns 200 OK, and it
isn't a hash URI. Theoretically it would be pretty easy to layer in
some rdfa into the page at d-nb that describes:

  http://d-nb.info/gnd/118646419#person

But I realize this is somewhat out of your control. I guess it would
also be possible to create a partial PURL [6] for
http://d-nb.info/gnd/ that would redirect, since I think the new PURL
software supports 303.

I was also interested to see that you have published some SKOS
Extensions [7] that are used to type each ore:Aggregation as a
specialization of skos:Concept:

<http://zbw.eu/beta/pm20/person/00012>
    a ore:Aggregation,
<http://zbw.eu/namespaces/skos-extensions/PmPersonFolder> ;
    skos:prefLabel "Abbe, Ernst; 1840-1905 (PM20 Personenarchiv)"@de,
"Abbe, Ernst; 1840-1905 (PM20 Persons Archives)"@en .

It looks like the rdf that comes back for your skos extensions
vocabulary (nice hack with the rdf validator btw) doesn't define
PmPersonFolder--but perhaps I missed it? I'm guessing from the
skos:prefLabel assertion that the PmPersonFolder is a specialization
of skos:Concept?

Would it be OK for me to experiment with pulling down the aggregated
resource bitstreams (jpg, etc) and storing them on disk? It would just
be a single threaded little script. Part of the rationale behind the
ore use at LC [1] is to foster LOCKSS [8] scenarios where digital
objects are easier to meaningfully harvest.

Anyhow, I have rattled on enough for now I suppose -- I mainly wanted
to say how exciting it was to see your announcement, being from the
digital library tribe in the linked data community :-)

//Ed

[1] http://chroniclingamerica.loc.gov
[2] http://inkdroid.org/bzr/ptolemy/crawl.py
[3] http://inkdroid.org/data/pm20.txt
[4] http://inkdroid.org/bzr/bin/rdfsum
[5] http://www.swib09.de/
[6] http://purl.org
[7] http://zbw.eu/namespaces/skos-extensions/
[8] http://en.wikipedia.org/wiki/LOCKSS

--

You received this message because you are subscribed to the Google Groups 
"OAI-ORE" group.
To post to this group, send email to oai-...@googlegroups.com.
To unsubscribe from this group, send email to 
oai-ore+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/oai-ore?hl=en.

AW: [pedantic-web] ANN: 20th Century Press Archives as ORE / Linked Data application - Technical Preview

Reply via email to