On Thursday, January 24, 2019 at 9:10:06 AM UTC-5, Alan Orth wrote: > > We have started looking at our use of metadata across our repositories and > I have to say that it is very confusing! First some background as I > understand it, then the current state of affairs in DSpace 4/5/6, and then > my question(s). :) > > Dublin Core is the original specification of fifteen elements from > 1995[0]. It was amended in 2000 to add element qualifiers like > "dc.date.issued" as well as a few new elements[1]. These were both > superseded in 2008 with the introduction of the Dublin Core Terms (aka > DCTERMS) specification[2], which essentially combines both of them. > > By default DSpace makes heavy use of both simple and qualified Dublin Core > in its input forms, but also provides crosswalks to translate many of these > to DCTERMS that are then exposed as metadata in the XMLUI[3][4]. It is very > easy to change the input forms to use different fields and even custom > namespaces, though some core fields seem to be dangerous (like dc.date.* > and dc.contributor.author). > > Our repository is consumed ravenously by search engines, but also by > increasingly many harvesters via REST and OAI APIs. If we want to make sure > that the metadata these harvesters receive is also standards compliant and > interoperable, shouldn't we update our input-forms and existing item > metadata to take some of the crosswalks into mind? For example: to start > using dc.language or dcterms.language instead of dc.language.iso (I would > of course update the crosswalks accordingly). Does any of this change in > DSpace 7? Is there any talk of moving away from a flat schema so that > authors and institutions could be related, for example? > >
I agree that too little attention is given to interchange, and how careful we have to be to make M2M communication meaningful without introducing errors and unwarranted assumptions. Since it appears to me that there is no such thing as dc.language.iso, using dc.language makes sense. There are several DSpace inventions masquerading as QDC. Some of them should be moved to a different namespace. No, I haven't yet made a table of my recommended changes. There is always talk of moving away from a flat namespace. I think this may be unnecessary. Authors, for example, are not contained by institutions; an author writing for or with the sponsorship of an institution should cause the authorship of the work to be marked with a relationship to the institution -- that is: an object references another by unique identifier. It may be convenient to express this externally with a hierarchial form (e.g. in METS), but that is merely for interchange; internally we should represent knowledge more flexibly so that we can produce whatever external representation is required without having to reverse too many assumptions. An author's employment and membership history would properly belong in a biographical repository, which (were such a thing to exist) would have quite a different sort of metadata structure. Given a sufficiently rich set of simple types, most of what we know about an author or a work or an institution should be usefully representable as simple lists. We also need to look at enriching our internal representation, but in a different way. I think it was Mark Diggory who observed that we talk about "metadata schemas" as if we had them, but what we really have is namespaces. A schema not only tells you what fields are defined and how they are named, but what kind of data a field may hold and what values of that kind are acceptable. A well-written schema will guarantee that the value stored in a field will make sense. If we declare that dc.date.accessioned is a date, then even if the UI mistakenly accepts a value of "Louisiana" as a date, the metadatavalue service would not, because that can't be understood as a date. We might store the value of dc.date.accessioned as a string encoding of a date, but then we might store it as a serialized Calendar, and the schema then tells us how to interpret the byte array and informs external interfaces of how they might represent the field's value. Another way of looking at this is that we encode information in form definitions which might be pushed into the metadata schemas (if we had them) and fetched by the form interface, allowing us to simplify the writing of forms. -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.