[Fedora-commons-developers] Planets/Fedora integration

Asger Askov Blekinge Thu, 08 Apr 2010 09:26:06 -0700

Hi

Planets (Planets-project.eu) is a big preservation project funded by the
European Commision. If you know of it, you know of it, otherwise I can
tell you more, but that is not the purpose of this mail.


My position is partly funded by this project. As part of my work, I
should do a Planets/Fedora integration. This mail will explain the
strategy I chose, and the results I have. The content is rather
technical and sketchy, but I wanted to get this information out there
now, as I have been postsponing it forever.

Firstly, a brief presentation of the Planets Interoperability Framework
(IF). IF uses DigitalObjects as the basic object. A planets digital
object looks somewhat like this

DigitalObject
{
  String title;
  URI format, permanentUri;
  List<Metadata> metadata;
  List<Event> events;
  Content content;
}
Metadata
{
  String content, name;
  URI type;
}
Event
{
  String summary, datetime;
  //other stuff....
}
Content
{
  Url or other data stuff
}

Planets wants to integrate with Fedora, ie, be able to use fedora as a
storage. For Storages, planets defined this API

public interface DigitalObjectManager {

    public URI storeAsNew(DigitalObject digitalObject) throws
DigitalObjectNotStoredException;
    
    public URI storeAsNew( URI pdURI, DigitalObject digitalObject )
throws DigitalObjectNotStoredException;

    public URI updateExisting(URI pdURI, DigitalObject digitalObject)
throws DigitalObjectNotStoredException, DigitalObjectNotFoundException;

    public boolean isWritable( URI pdURI );
        
    public List<URI> list(URI pdURI);

    public DigitalObject retrieve(URI pdURI) throws
DigitalObjectNotFoundException;
}

Basically, one should be able to list all the "subobjects" beneath a
single object. To retrieve any object. To store a planets object as a
new object in the storage and to update an object in the storage. Basic
stuff, really.


There are many ways of doing this. The first problem I faced was the
structure of the planets objects. There were a lot less general than the
fedora objects. I wanted to keep the flexibily of Fedora, and I did not
want people to rewrite all their data objects for them to use Planets.

The problem is simply
Fedora object
{
 String pid, label;
 List<Datastreams> datastreams
} should me mapped to planets object.

I chose content models as the way of doing this. 
The content model should specify:
 * Which datastream should be regarded as the content
 * Which datastreams should be regarded as metadata
 * How events should be mapped

This can be expressed in an simple XML fragment, which can be stored in
an datastream in a content model. Events are ignored, because they are
difficult to work with, and the planets system doesn't really use them
anyway.

<planetsDatastream>

  <contentdatastream>
    <name>CONTENTS</name>
  </contentdatastream>

  <metadatastreams>
    <metadatastream>
      <planetsName>DC</planetsName>
      <name>DC</name>
    </metadatastream>
  </metadatastreams>

</planetsDatastream>


With this, to retrieve a Planets object from Fedora, the following
happens
 * The fedora object is retrieved.
 * The planets content model of the fedora object is retrieved
 * A new Planets object is created
 * The Planets object contents is set to the URL of the specified
content datastream
 * Each of the metadatastreams are retrieved, and their content stored
in the planets object
 * Other stuff, like title is set
 * The planets object is returned


To update a planets object, the following procedure happens
 * The fedora object is retrieved
 * The planets content model of the fedora object is retrieved
 * If the planets object content refers to a datastream inside the
fedora object, we know the content have not changed.
 * Otherwise, the planets content is stored instead of the content in
the fedora object (managed)
 * Each of the metadata datastreams are replaced with the versions from
the planets object. 

To store a planets object as a new Fedora object, the following happens
 * A new fedora object is created
 * The new fedora object is given a reference to the default planets
contentmodel
 * Each of the planets datastreams are created in the new object, and
filled with the data from the planets object.


In Fedora, there exist 2+ special objects

PlanetsContentModelContentModel: The content models, that all planets
content models must have. Requires the existence of the
planetsdatastream
Has a relation to the default planets content model.

1+ planets content models. 

So, various fedora objects can have various planets content models, so
they do not all have to map to planets the same way.



This was the very sketchy explanation of how I have made the planets
Fedora integration. I have made a simple library, that implements the
DigitalObjectManager interface mentioned above, and use FedoraClient to
communicate with Fedora. I will find a home for this, on the Fedora
pages or sourceforge soonish, and give a better presentation.

Regards
Asger


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

[Fedora-commons-developers] Planets/Fedora integration

Reply via email to