Hello ! I'm happy to share this interest in digital standards, while I absolutely understand that your final concern is to plan developing effort. So I replay to your questions, with the only aim of giving ideas for your concrete study on Invenio data-model.
On Tue, Jun 14, 2011 at 8:08 PM, Piotr Praczyk <[email protected]>wrote: > This is not the use case of figures from scientific publications (If I > understand correctly, looks rather like digitalisation of entire documents), > though seems to be relevant for Invenio/Inspire in general. Looks like a > nice benchmark of the underlying data-structures. > OK, I understand. And I agree. > >Speaking in concrete words: in my experience quite every time I saw, > >- descriptive-metadata (like MARC) managed on one side with specific > (multiple) identifiers (..also modifiable identifiers, in the collaborative > systems..) > >- digital-repositories, on the other side, with specific (stable!!) > identifiers for digital-objects and their component files, > >- and, in the middle, digital-metadata (like METS) which guarantee the > connections (regardless the physical file storage). > > I think, I did not understand this part. > I used this example only to sustain that (in the field of books digitization) usually descriptive-metadata can continuously change, while digital-metadata remain stable. (That's why we benefit from a separation between standards like MARC and METS on the two sides). What are the cases of modifiable identifiers inside MARC ? Titles of > documents + authors ? It's the "extreme case" I have to deal with, in my Invenio tests :-( It happens (in library collaborative network) when two MARC records describe the same publication, maybe coming from two different libraries which described their own book-copies. The two record can be merged (say: "B" is chosen and includes the copies of "A"), so that in the export from that system I receive the new-record ("B plus A copies"), with reference to the old-record to be replaced (->"A"). In this case: the Invenio "representation" of the MARC-record can change (..titles, authors and also system-identifier), but maintains its internal-Invenio-ID, because the publication is the same (so that I have to maintain Invenio added information like user-comments, or digitization). In my opinion, this absolutely doesn't affect the Invenio data-model, it only affects the Invenio importing procedures: personally, I worked on the level of BibConvert. But Samuele recently (mailing-list, 2011/03/31, "RFC: bibupload --merge for WebSubmit") explained that the merging procedure can be made with human control using the new BibMerge web interface. > Exact file paths in the file system (as we happen to still have in some > places in Inspire ?) > No no no: in my little case, please, consider Invenio as a service-provider where multiple data-flows come, and each record receives a permanent Invenio-identifier, and permanent-pointers to digitizations. (I hope this replays) > By link between two do you mean a document identifying the same document > with both at the same time ? > I simply mean this: - MARC could point to METS (ex: using 856 field for a link to a METS file of the same record). But it's better if - METS points-to or englobe MARC; and points-to FILEs, describing their features (md5, format, dimension, URL/URN/URI, ...), their document structure, access rights, etc etc. This from the simple view point of exports (and, potentially, import). While, from the view point of the data internally managed (internally created/modified/only_indexed), I know it's a different subject: I like Samuele's expression "*it would be nice to support METS in importing and exporting, (by storing a side when importing anything that is not understood, so that it can be re-exported)*"*. *I interpret that in this way: Invenio could - accepts a (configurable?) selection of METS profiles, for import (after validation?), store (as an XML blob?), and export; - and understands a (configurable?) selection of METS elements (extracted from blob with something like XPath or XQuery, and stored in Invenio tables?), for its internal management (files pointers, simple doc.structure and relation among files [versions, formats, pages]). > What is physical storage for You ? From the physical storage I wanted to > abstract exactly by providing such links /object/DOCID > I'm using "physical storage" by the general meaning of disks / servers / storage-center / external-service-for-digitization / or every solution for the referred files to be accessed. What I'm trying to propose is this: Invenio could guarantee - persistence of identifiers and logical file-pointers (like your /object/DOCID) within metadata; and - configurable interpretation of the logical pointers. Example: if a collection of images [/nnnn/CollX_DocY] grows so much that I decide to transport all files in a bigger storage system, I'll be able to redirect all that pointers towards the new system without changing the metadata, only by a "re-configuration" of the resolver for that collection-pointers. (I know It's a trivial problem that can be solved also at a lower level than Invenio installation, but I wanted to share a big preoccupation I saw in digitization projects). I think, the word "version" creates confusion here as version in this sense > is format in Invenio. > The version which I was talking about is a number telling, how many times > object was modified. Maybe revision is a better word. > Thank you: I really misunderstood. And of course I agree with you about the importance of revisions too. Indeed I think that also that feature could be supported in METS: please take a look to this record of the Library of Congress ( http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.09601/contactsheet.html), which contains 2 revisions (effectively called "versions") of the same picture, both with 2 formats (tiff and jpg). [.. I'm still looking for an official METS example with: many pages (sections / figures), in many formats, with many revisions...] > I was rather thinking about providing a link between for example 1st > revision of the full text (whichever format) and 3rd revision of a figure. > Assigning data to connection between particular revisions will be important > from the point of view of processing of figures. > Very interesting ! If I understand, you are looking to support the complete work-flow on a document with possible connection between any stage of any component part (rather than the complete situation of the only final stage). Well: really I think that METS supports that, and It's only a matter of conventional usage of its basic elements. Referring only to the basic schema of METS ( http://sunsite3.berkeley.edu/mets/diagram/ and http://www.loc.gov/standards/mets/docs/mets.v1-9.html) we find that: - the <file> elements (with all their singular specification about *formats*, description and technical data) can be organized in whatever quantity of nested <fileGrp> and <fileGrpType> elements. - At every level the "attribute:USE" can record information about its usage (.. master, reference, thumbnails..). - And *revisions* (of each file-format) can be recorded within apposite <fileGrp> elements using the "attribute:VERSDATE" ("*An optional dateTime attribute specifying the date this version/fileGrp of the digital object was created*") > Looking from my perspective, I think it would be nice to repeat the example > in custom XML I proposed few mails ago and see if it can be easily > reproduced.. > You are right: some concrete example (..let me only start with official examples..): - The above LoC record has a METS file which contains all the most important informations ( http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.09601/mets.xml): please look the last 35 lines (and the MARC wrapped description, at lines 84-291) - The METS+PREMIS_profile export of the same record ( http://www.loc.gov/standards/premis/louis-2-0.xml) presents additional information of all the events operated for the storage: (lines 472-639) "validation, ingestion, migration" [..Maybe a simple re-use of these files, inserting your data, could be a valid starting example?] Thanks very much for your attention (..I don't know whether I'm annoying the mailing-list, so feel free to ask me directly whatever you think I know) Cheers Cristian

