Hello !
I'm happy to share this interest in digital standards, while I absolutely
understand that your final concern is to plan developing effort.
So I replay to your questions, with the only aim of giving ideas for your
concrete study on Invenio data-model.

On Tue, Jun 14, 2011 at 8:08 PM, Piotr Praczyk <[email protected]>wrote:

> This is not the use case of figures from scientific publications (If I
> understand correctly, looks rather like digitalisation of entire documents),
> though seems to be relevant for Invenio/Inspire in general. Looks like a
> nice benchmark of the underlying data-structures.
>
OK, I understand. And I agree.


>  >Speaking in concrete words: in my experience quite every time I saw,
> >- descriptive-metadata (like MARC) managed on one side with specific
> (multiple) identifiers (..also modifiable identifiers, in the collaborative
> systems..)
> >- digital-repositories, on the other side, with specific (stable!!)
> identifiers for digital-objects and their component files,
> >- and, in the middle, digital-metadata (like METS) which guarantee the
> connections (regardless the physical file storage).
>
> I think, I did not understand this part.
>
I used this example only to sustain that (in the field of books
digitization) usually descriptive-metadata can continuously change, while
digital-metadata remain stable. (That's why we benefit from a separation
between standards like MARC and METS on the two sides).

What are the cases of modifiable identifiers inside MARC ? Titles of
> documents + authors ?

It's the "extreme case" I have to deal with, in my Invenio tests :-(
It happens (in library collaborative network) when two MARC records describe
the same publication, maybe coming from two different libraries which
described their own book-copies. The two record can be merged (say: "B" is
chosen and includes the copies of "A"), so that in the export from that
system I receive the new-record ("B plus A copies"), with reference to the
old-record to be replaced (->"A").
In this case: the Invenio "representation" of the MARC-record can change
(..titles, authors and also system-identifier), but maintains its
internal-Invenio-ID, because the publication is the same (so that I have to
maintain Invenio added information like user-comments, or digitization).
In my opinion, this absolutely doesn't affect the Invenio data-model, it
only affects the Invenio importing procedures: personally, I worked on the
level of BibConvert. But Samuele recently (mailing-list, 2011/03/31, "RFC:
bibupload --merge for WebSubmit") explained that the merging procedure can
be made with human control using the new BibMerge web interface.


> Exact file paths in the file system (as we happen to still have in some
> places in Inspire ?)
>
No no no: in my little case, please, consider Invenio as a service-provider
where multiple data-flows come, and each record receives a permanent
Invenio-identifier, and permanent-pointers to digitizations. (I hope this
replays)


> By link between two do you mean a document identifying the same document
> with both at the same time ?
>
I simply mean this:
- MARC could point to METS (ex: using 856 field for a link to a METS file of
the same record). But it's better if
- METS points-to or englobe MARC; and points-to FILEs, describing their
features (md5, format, dimension, URL/URN/URI, ...), their document
structure, access rights, etc etc.
This from the simple view point of exports (and, potentially, import).

While, from the view point of the data internally managed (internally
created/modified/only_indexed), I know it's a different subject: I like
Samuele's expression "*it would be nice to support METS in importing and
exporting, (by storing a side when importing anything that is not
understood, so that it can be re-exported)*"*.
*I interpret that in this way: Invenio could
- accepts a (configurable?) selection of METS profiles, for import (after
validation?), store (as an XML blob?), and export;
- and understands a (configurable?) selection of METS elements (extracted
from blob with something like XPath or XQuery, and stored in Invenio
tables?), for its internal management (files pointers, simple doc.structure
and relation among files [versions, formats, pages]).


> What is physical storage for You ? From the physical storage I wanted to
> abstract exactly by providing such links /object/DOCID
>
I'm using "physical storage" by the general meaning of disks / servers /
storage-center / external-service-for-digitization / or every solution for
the referred files to be accessed.
What I'm trying to propose is this: Invenio could guarantee
- persistence of identifiers and logical file-pointers (like your
/object/DOCID) within metadata; and
- configurable interpretation of the logical pointers.
Example: if a collection of images [/nnnn/CollX_DocY] grows so much that I
decide to transport all files in a bigger storage system, I'll be able to
redirect all that pointers towards the new system without changing the
metadata, only by a "re-configuration" of the resolver for that
collection-pointers.
(I know It's a trivial problem that can be solved also at a lower level than
Invenio installation, but I wanted to share a big preoccupation I saw in
digitization projects).

I think, the word "version" creates confusion here as version in this sense
> is format in Invenio.
> The version which I was talking about is a number telling, how many times
> object was modified. Maybe revision is a better word.
>
Thank you: I really misunderstood. And of course I agree with you about the
importance of revisions too.
Indeed I think that also that feature could be supported in METS: please
take a look to this record of the Library of Congress (
http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.09601/contactsheet.html),
which contains 2 revisions (effectively called "versions") of the same
picture, both with 2 formats (tiff and jpg).
[.. I'm still looking for an official METS example with: many pages
(sections / figures), in many formats, with many revisions...]


> I was rather thinking about providing a link between for example 1st
> revision of the full text (whichever format) and 3rd revision of a figure.
> Assigning data to connection between particular revisions will be important
> from the point of view of processing of figures.
>
Very interesting ! If I understand, you are looking to support the complete
work-flow on a document with possible connection between any stage of any
component part (rather than the complete situation of the only final stage).
Well: really I think that METS supports that, and It's only a matter of
conventional usage of its basic elements.
Referring only to the basic schema of METS (
http://sunsite3.berkeley.edu/mets/diagram/ and
http://www.loc.gov/standards/mets/docs/mets.v1-9.html) we find that:
- the <file> elements (with all their singular specification about *formats*,
description and technical data) can be organized in whatever quantity of
nested <fileGrp> and <fileGrpType> elements.
- At every level the "attribute:USE" can record information about its usage
(.. master, reference, thumbnails..).
- And *revisions* (of each file-format) can be recorded within apposite
<fileGrp> elements using the "attribute:VERSDATE" ("*An optional dateTime
attribute specifying the date this version/fileGrp of the digital object was
created*")


> Looking from my perspective, I think it would be nice to repeat the example
> in custom XML I proposed few mails ago and see if it can be easily
> reproduced..
>
You are right: some concrete example (..let me only start with official
examples..):
- The above LoC record has a METS file which contains all the most important
informations (
http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.09601/mets.xml):
please look the last 35 lines (and the MARC wrapped description, at lines
84-291)
- The METS+PREMIS_profile export of the same record (
http://www.loc.gov/standards/premis/louis-2-0.xml) presents additional
information of all the events operated for the storage: (lines 472-639)
"validation, ingestion, migration"

[..Maybe a simple re-use of these files, inserting your data, could be a
valid starting example?]

Thanks very much for your attention (..I don't know whether I'm annoying the
mailing-list, so feel free to ask me directly whatever you think I know)

Cheers
Cristian

Reply via email to