Dear All,

Here my attempts to reformulate the objectives of the CRM, its scope, and the methods of extensions. Please comment! To be discussed next week in the meeting.

Best,

Martin


 ISSUE 336


     Introduction

This document is the formal definition of the*CIDOC Conceptual Reference Model (“CRM”), *a formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information and similar information from other domains. The CRM is the culmination of more than two decades of standards development work by the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM). Work on the CRM itself began in 1996 under the auspices of the ICOM-CIDOC Documentation Standards Working Group. Since 2000, development of the CRM has been officially delegated by ICOM-CIDOC to the CIDOC CRM Special Interest Group, which has been collaborating soon after with the ISO working group ISO/TC46/SC4/WG9 to bring the CRM to the form and status of an International Standard. This collaboration has resulted in ISO21127:2004 and ISO21127:2014, and will be continued to produce the next update of the standard. This document belongs to the series of evolving versions of the formal definition of the**CRM, which serve the ISO working group as community draft for the standard. Eventual minor differences of the ISO standard text from the CIDOC version in semantics and notation that the ISO working group requires and implements are harmonized in the subsequent versions of the CIDOC version.


     Objectives of the CIDOC CRM

The primary role of the CRM is to enable the exchange and integration of information from heterogeneous sources for the reconstruction and interpretation of the past at a human scale, based on all kinds of material evidence, including texts, audiovisual material and even oral tradition. It starts from, but is not limited to, the needs of museum documentation and research based on museum holdings. It aims at providing the semantic definitions and clarifications needed to transform disparate, localised information sources into a coherent global resource, be it within a larger institution, in intranets or on the Internet, and to make it available for scholarly interpretation and scientific evaluation. Its perspective is supra-institutional and abstracted from any specific local context. This goal determines the constructs and level of detail of the CRM.

More specifically, it defines, in terms of a formal ontology, the *underlying semantics* of database *schemata* and *structured* documents used in the documentation of cultural heritage and scientific activities. In particular it defines the semantics related to the study of the past and current state of our world, as it is characteristic for museums, but also or other institutions and disciplines. It does *not* define any of the *terminology* appearing typically as data in the respective data structures; however it foresees the characteristic relationships for its use. It does *not* aim at proposing what cultural institutions *should* document. Rather it explains the logic of what they actually currently document, and thereby enables *semantic interoperability.*

It intends to provide a model of the intellectual structure of the respective kinds of documentation in logical terms. As such, it is not optimised for implementation-specific storage and processing aspects. Implementations may lead to solutions where elements and links between relevant elements of our conceptualizations are no longer explicit in a database or other structured storage system. For instance, the birth event that connects elements such as father, mother, birth date, birth place may not appear in the database, in order to save storage space or response time of the system. The CRM allows us to explain how such apparently disparate entities are intellectually interconnected, and how the ability of the database to answer certain intellectual questions is affected by the omission of such elements and links.


 Scope of the CIDOC CRM

The overall scope of the CIDOC CRM can be summarised in simple terms as the curated, *factual knowledge* about the past at a human scale.

However, a more detailed and useful definition can be articulated by defining both the *Intended Scope*, a broad and maximally-inclusive definition of general application principles, and the Practical Scope, which is expressed by the overall scope of a growing reference set of specific, identifiable documentation standards and practices that the CRM aims to encompass, however restricted in its details to the limitations of the Intended Scope.

The reasons for this distinctions are twofold. Firstly, the CRM is developed in a “*bottom-up*” manner, starting from well-understood, actually and widely used concepts of domain experts, which are disambiguated and gradually generalized as more forms of encoding are encountered. This allows for avoiding the misadaptations and vagueness often found in introspection-driven attempts to find overarching concepts for such a wide scope, and provides stability to the generalizations found. Secondly, it is a means to identify and keep a focus on the concepts most needed by the communities working in the scope of the CRM and to maintain a well-defined agenda for its evolution.

The *Intended Scope* of the CRM may be defined as all information required for the exchange and integration of heterogeneous scientific and scholarly documentation about the past at a human scale and its evidence that has come upon us. This definition requires further elaboration:

 * The term “scientific and scholarly documentation” is intended to
   convey the requirement that the depth and quality of descriptive
   information that can be handled by the CRM should be sufficient for
   serious academic research. This does not mean that information
   intended for presentation to members of the general public is
   excluded, but rather that the CRM is intended to provide the level
   of detail and precision expected and required by museum
   professionals and researchers in the field.

 * As “evidence that has come upon us” are regarded all types of
   material collected and displayed by museums and related
   institutions, as defined by ICOM[1] <#_ftn1>, and other collections,
   in-situ objects, sites, monuments and intangible heritage relating
   to fields such as social history, ethnography, archaeology, fine and
   applied arts, natural history, history of sciences and technology.

 * The documentation includes the detailed description of individual
   items, in situ or within collections, groups of items and
   collections as a whole, as well as practices of intangible heritage.
   It pertains to their current state as well as to information about
   their past. The CRM is specifically intended to cover contextual
   information: the historical, geographical and theoretical background
   that gives cultural heritage collections much of their cultural
   significance and value.
 * The exchange of relevant information with libraries and archives,
   and the harmonisation of the CRM with their models, falls within the
   Intended Scope of the CRM.
 * Information required solely for the administration and management of
   cultural institutions, such as information relating to personnel,
   accounting, and visitor statistics, falls outside the Intended Scope
   of the CRM.

The Practical Scope[2] <#_ftn2> of the CRM is expressed in terms of the set of reference standards and de facto standards for documenting factual knowledge that have been used to guide and validate the CRM’s development and its further evolution. The CRM covers the same domain of discourse as the union of these reference standards; this means that for data correctly encoded according to these documentation formats there can be a CRM-compatible expression that conveys the same meaning.


     Coverage and Extensions

The intended scope of the CRM is a subset of the “real” world and is therefore potentially infinite. Further, the strategy to develop the model bottom-up from a practical scope has the consequence that the model will always miss some areas of relevant application or, on the other hand, some parts may not be developed in sufficient detail for a specialized field of study, such as /E30 Right/. Therefore, the CRM has been designed to be extensible by different mechanisms in order to achieve an optimal coverage of the intended scope without losing compatibility with the CRM.

Strict *compatibility of extensions* with the CRM means that data structured according to an extension must also remain valid as a CRM instance. In practical terms, this implies /query containment: /any queries based on CRM concepts should retrieve a result set that is correct according to the CRM’s semantics, regardless of whether the knowledge base is structured according to the CRM’s semantics alone, or according to the CRM plus compatible extensions. For example, a query such as “list all events” should recall 100% of the instances deemed to be events by the CRM, regardless of how they are classified by the extension.

A sufficient condition for the compatibility of an extension with the CRM is that CRM classes subsume all classes of the extension, and all properties of the extension are either *subsumed* by CRM properties, or are *part of a path* for which a CRM property is a shortcut. Obviously, such a condition can only be tested intellectually.

The mechanisms for extensions are:

1. Existing classes and properties can be extended dynamically using
   thesauri and controlled vocabularies with CRM properties having as
   range /E55 Type/, as further elaborated in the section “About
   Types”. This approach is preferable when specializations of classes
   are independent from specializations of properties, and for local,
   non-standardized concepts.
2. Existing classes and properties can be extended structurally by
   adding subclasses and subproperties respectively. This approach is
   particularly recommended to communities of practice needing
   well-established properties specific to classes that are not present
   in the CRM.
3. Additional information that falls outside the semantics formally
   defined by the CRM can trivially be recorded as unstructured data
   using /E1 CRM Entity. P3 has note: E62 String/ to attach such
   information to the most adequate instance in the respective
   knowledge base. This approach is preferable when detailed, targeted
   queries are not expected; in general, only those concepts used for
   formal querying**need to be explicitly modelled.


     Conservative Extensions of Scope

Extensions may be incorporated in *new versions* of the CRM, or become *semi-independent modules* maintained in parallel to the CRM by communities of practice. In mechanisms 1 and 2 above, the CRM concepts subsume and thereby cover the extensions. This specialization as only method of extension would mean that the CRM from the beginning has foreseen all necessary high-level classes and properties. This comes in conflict with the very successful bottom-up methodology of evolution of the CRM itself and the development of extensions more peripheral to the current practical scope.

Extensions that are the result of widening the scope, rather than elaborating it in more detail, may quite well find a class “C” not covered by the CRM so far and even a superclass “B” of class C that must be regarded as a superclass of an existing CRM class “A”. From a logical-theoretical point of view, we precisely regard such extensions as compatible, if the CRM classes subsume all classes and all properties of the extension as long as instances are *restricted to the not extended scope* of the CRM.

In this case, an existing property p of class A may also hold for the new superclass B. We call the latter a *conservative extension*. That is, when restricted to the original class A, the extended property, p’, is identical to the original property p. In general, a superproperty is said to be a conservative extension of a subproperty when it is identical to the subproperty when restricted to its domain and range. In first order logic, the conservative extension of a property can be expressed as follows. Assume that A and C are subclasses of B and D respectively and  that p, p’ are properties between A,C and B, D respectively:

A(x) ⊃B(x)
C(x) ⊃D(x)
P(x,y) ⊃A(x)
P(x,y) ⊃C(y)
P’(x,y) ⊃B(x)
P’(x,y) ⊃D(y)

If p’ is a conservative extension of p then

A(x) ∧C(y) ∧P’(x,y) ≡  P(x,y)

This is similar to what in logic is called a conservative extension of a theory. This construct is necessary for an effective modular management of ontologies, but is not possible with the current way RDF/OWL treats it. It has very important *practical consequences*:

1.Taken on its own, the CRM is not affected by such an conservative extension of scope, since it is not concerned with instances of class B that are not in class A.

2.If a conservative extension is incorporated into a *new version* of the CRM, the new version becomes *backwards compatible* with the previous one (therefore it is conservative).

3.The *bottom-up* development of ontologies encourages to find as domain and range of a property not the most general ones for all future, but the *best understood* ones, and leave it to conservative extensions to find more general ones in the future.

4.Extensions of the CRM maintained in separate modules that declare classes and/or properties not covered by superclasses and/or superproperties of the CRM *should* *clearly* *mark* the highest-level ones to be used by a respective query system in order to retrieve all instances described in terms of the CRM and the extension modules.

5.Extensions of the CRM maintained in separate modules must be harmonized with the CRM: All ontologically justified relationships of *subsumption* between the CRM and the extension should *explicitly* be declared and contained in the extension, or, if indicated, be submitted for the CRM to consider their inclusion.

It is the hope that over time the CRM and its compatible extension modules will provide a more and more complete coverage of the intended scope as a coherent logical and ontologically adequate theory of widest practical use. Besides others, this will require a collaboration of the involved communities based on a continuous effort of mutual understanding and respect.


------------------------------------------------------------------------

[1] <#_ftnref1> The ICOM Statutes provide a definition of the term “museum” at http://icom.museum/statutes.html#2

[2] <#_ftnref2> The Practical Scope of the CIDOC CRM, including a list of the relevant museum documentation standards, is discussed in more detail on the CIDOC CRM website at http://cidoc.ics.forth.gr/scope.html

--
------------------------------------
 Dr. Martin Doerr

 Honorary Head of the
 Center for Cultural Informatics

 Information Systems Laboratory
 Institute of Computer Science
 Foundation for Research and Technology - Hellas (FORTH)

 N.Plastira 100, Vassilika Vouton,
 GR70013 Heraklion,Crete,Greece

 Vox:+30(2810)391625
 Email: mar...@ics.forth.gr
 Web-site: http://www.ics.forth.gr/isl

Reply via email to