Re: Implementing METS and PREMIS in Invenio. Ideas?

2007-06-28 Thread Jerome Caffaro

Hi Ferran,

Unfortunately we have no resource available for implementing PREMIS or 
METS in CDS Invenio this year.


Still we have been discussing internally about support for these 
standards in the past, and would be interested to collaborate as much as 
we can if you are willing to implement these standards on your side.


The implementation in CDS Invenio does seem feasible without big changes 
in the software, although a deeper analysis would be necessary.


Do you have some news about this project/petition since your your last 
email?


Best regards,

Jerome

PS: nice crash course!

Ferran Jorba wrote:

Hi all,

I'm writing this message so we can gain some input about a petition
we have at UAB that could be potentially useful for other users:
implementing METS and PREMIS in Invenio.

I'm attaching a crash course on METS, MODS, PREMIS and MIX at the end of
this message for the benefit of those who haven't had a chance to look
at them.

My question is whether PREMIS and METS are in the Invenio pipeline
(although I haven't seen them at
https://savannah.cern.ch/task/?group=cdsware) and/or collect some
preliminary ideas about wether it could be implementable, and how.

From what I have read, PREMIS should not be mixed in descriptive
metadata (MARCXML in Invenio case).  My first, preliminary conclusion is
that it'd better be in separate tables, and data would be `pulled of'
only if needed, wether in basic Web browsing or via OAI server.
Technical details of the digital objects should better be automatically
extracted via software (ex., ImageMagick or JHOVE).  Permisions and
copyright issues are dealt also separatelly.

In our case, the Spanish Ministry of Culture is offering grants for old
journal digitalisation to improve access to historical press
(http://prensahistorica.mcu.es/), and METS and PREMIS compliance give
'extra points', so to speak.

I know it can be hard to say anything with this little information, but
I'd like to hear about CERN ideas about this issue (and sooner better
than later, given our timetable ;-)

Thanks a lot,

Ferran

---

Crash course on METS, MODS, MIX and PREMIS

First of all, all those standards are endorsed by the Library of
Congress.  In their standards page (http://www.loc.gov/standards/) there
is a one-sentence description for each of them, plus all the details in
there respective pages.  However, it took me a while until I `got' them
and put all them into perspective, and this is the humble purpose of
those paragraphs.  Please take them very cautiously; I've just learned
them and I'm not any expert.  That said, here we go:

In the world of digital preservation, there is an agreement that is
necessary to keep metadata of several kinds for each digital object, so
preservation policies can be applied, now or in the future.  This
metadata can (or must) be of several kinds

- Descriptive: examples are the well known MARC or MARCXML, Dublin Core
  or MODS.  MODS (Metadata Object Description Schema,
  http://www.loc.gov/standards/mods/) is, roughly said, a subset of
  MARC21, but richer than Dublin Core.  Invenio alreay provides two of
  them, no problem here, and an optional MODS output
  (http://www.loc.gov/standards/mods/mods-mapping.html) can be worked
  out when XML bibformats stabilise.

- Administrative: including rights and permissions, provenance (origin)
  and structural.  The preservation ones are expressed in PREMIS
  (http://www.loc.gov/standards/premis/).

- Technical, such as image (http://www.loc.gov/standards/mix/) or text
  details (textMD)

and METS (http://www.loc.gov/standards/mets/) basically wraps all them
together.





Re: Implementing METS and PREMIS in Invenio. Ideas?

2007-06-28 Thread Ferran Jorba
Hello Jerome,

 Unfortunately we have no resource available for implementing PREMIS or
 METS in CDS Invenio this year.

 Still we have been discussing internally about support for these
 standards in the past, and would be interested to collaborate as much
 as we can if you are willing to implement these standards on your
 side.

I'll be happy to help, although I cannot be a full-time developer, that
is not my job here at UAB.  I'm glad to provide fixes, ideas, testing,
small patches, as always (I do the translations at home, during my
copious free time), but most of my time is trying to make things work,
not to develop large projects.

 The implementation in CDS Invenio does seem feasible without big
 changes in the software, although a deeper analysis would be
 necessary.

I agree with you.  I'd like to see more real world examples about how it
is implemented, because I cannot imagine how a METS record with
descripive metadata (MARCXML), some rights and origin information, with
strong structural information and MIX detail for each scanned page of a
medium-size journal (ex., our http://ddd.uab.cat/record/17654).
Retrieving such a METS record could put the system at its knees.

 Do you have some news about this project/petition since your your last
 email?

Nope.  Yours is the first.

Thanks,

Ferran