Hi Ferran,
Unfortunately we have no resource available for implementing PREMIS or
METS in CDS Invenio this year.
Still we have been discussing internally about support for these
standards in the past, and would be interested to collaborate as much as
we can if you are willing to implement these standards on your side.
The implementation in CDS Invenio does seem feasible without big changes
in the software, although a deeper analysis would be necessary.
Do you have some news about this project/petition since your your last
email?
Best regards,
Jerome
PS: nice crash course!
Ferran Jorba wrote:
Hi all,
I'm writing this message so we can gain some input about a petition
we have at UAB that could be potentially useful for other users:
implementing METS and PREMIS in Invenio.
I'm attaching a crash course on METS, MODS, PREMIS and MIX at the end of
this message for the benefit of those who haven't had a chance to look
at them.
My question is whether PREMIS and METS are in the Invenio pipeline
(although I haven't seen them at
https://savannah.cern.ch/task/?group=cdsware) and/or collect some
preliminary ideas about wether it could be implementable, and how.
From what I have read, PREMIS should not be mixed in descriptive
metadata (MARCXML in Invenio case). My first, preliminary conclusion is
that it'd better be in separate tables, and data would be `pulled of'
only if needed, wether in basic Web browsing or via OAI server.
Technical details of the digital objects should better be automatically
extracted via software (ex., ImageMagick or JHOVE). Permisions and
copyright issues are dealt also separatelly.
In our case, the Spanish Ministry of Culture is offering grants for old
journal digitalisation to improve access to historical press
(http://prensahistorica.mcu.es/), and METS and PREMIS compliance give
'extra points', so to speak.
I know it can be hard to say anything with this little information, but
I'd like to hear about CERN ideas about this issue (and sooner better
than later, given our timetable ;-)
Thanks a lot,
Ferran
---
Crash course on METS, MODS, MIX and PREMIS
First of all, all those standards are endorsed by the Library of
Congress. In their standards page (http://www.loc.gov/standards/) there
is a one-sentence description for each of them, plus all the details in
there respective pages. However, it took me a while until I `got' them
and put all them into perspective, and this is the humble purpose of
those paragraphs. Please take them very cautiously; I've just learned
them and I'm not any expert. That said, here we go:
In the world of digital preservation, there is an agreement that is
necessary to keep metadata of several kinds for each digital object, so
preservation policies can be applied, now or in the future. This
metadata can (or must) be of several kinds
- Descriptive: examples are the well known MARC or MARCXML, Dublin Core
or MODS. MODS (Metadata Object Description Schema,
http://www.loc.gov/standards/mods/) is, roughly said, a subset of
MARC21, but richer than Dublin Core. Invenio alreay provides two of
them, no problem here, and an optional MODS output
(http://www.loc.gov/standards/mods/mods-mapping.html) can be worked
out when XML bibformats stabilise.
- Administrative: including rights and permissions, provenance (origin)
and structural. The preservation ones are expressed in PREMIS
(http://www.loc.gov/standards/premis/).
- Technical, such as image (http://www.loc.gov/standards/mix/) or text
details (textMD)
and METS (http://www.loc.gov/standards/mets/) basically wraps all them
together.