Dear Daria,
Thank you for this message! I think we all agree with your comment, but
sadly, people seem to forget that more and more, being blinded by the
flood of digitally available information.
When I wrote:
"Using an archival identifier for a paper copy, a removable digital
medium or a URL for a file on a machine, in all cases the maintainers of
the archive must guarantee that the identifier will be uniquely
connected with the content. Otherwise, using a URL in a KB is simply
inadequate."
I did not want to open this discussion. For 500 years the disciples of
Buddha relied on oral tradition by many people as being more reliable
than the written form.
Here an anlysis we did many years ago, with an analytical model:
Petraki, M. (2005). Evaluating the reliability of system configurations
for long term digital preservation
<http://elocus.lib.uoc.gr/dlib/0/f/d/metadata-dlib-2005petraki_mer.tkl>.
(pdf <https://publications.ics.forth.gr/_publications/Petraki.pdf>).
But true preservation needs continuous control if the words are still
understood (as in oral tradition), .
This effort can be done only to a limited set of things, so we need a
selection of what we want to remember. This requires an understanding of
our cultures, do we have it?
Isn't it?
Best,
Martin
On 9/17/2023 9:42 AM, Гук Дарья Юрьевна wrote:
Dear all,
beeing out of discussion (too many reasons) I can add a metaphorical
opinion.
Poem about Gilgamesh is known after clay peaces, broken, copied and
fixed with adds, and are able to read it only with interpreter. The
same with any file - machine readable data, wich are fixed phisically
and accessed only by computer interpretator and electricity. Be sure,
most part of files even printed has no sence for human. Are you
interected, where are cleaning lady's rag? Never! Although it's
ethnography. Conclusion: files exist only for robots.
Copies of video and sound as part of heritage are other thing
presenting only specific part, but be sure, useless century years
after creation. Maybe we need caracterize them by durance of use?
With kind regards,
Daria Hookk
------------------------------------------------------------------------
*От:* Crm-sig <crm-sig-boun...@ics.forth.gr> от имени Martin Doerr via
Crm-sig <crm-sig@ics.forth.gr>
*Отправлено:* 16 сентября 2023 г. 0:08:46
*Кому:* crm-sig
*Тема:* [Crm-sig] Issue 490: how to model a file [HW reminder]
Dear All,
Let me summarize the discussion about issue 490 between George,
Christian-Emil and me, to be discussed in the next meeting:
"How to model a file" may be too vague.
There are three aspects:
A) What constructs are needed in the CRM ontologically to refer to the
unique content of a file.
B) What constructs are needed to refer unambiguously to a resource
that changes content. This is modeled in CRMpem as "Volatile Dataset",
and will not be discussed in this issue.
C) How to connect in a knowledge base to a materialized content
description.
About A):
We take a file (see also Persistent Dataset in CRMpem) in the sense of
an immaterial E73 Information Object as a unique sequence of symbols
that can be machine-encoded, regardless what groups of bits constitute
one of the symbols of interest in this object.
in the KB: The intended identity can be represented by a URI.
We take a file in the sense of a material copy on a digital medium as
a kind of "E24 Human-Made Feature", regardless whether it is on a
*local* installation, in a "*cloud*" cluster of machines, a *LOCKSS*
federation of copies, or on a *removable* carrier.
in the KB: We may refer to the material copy by an *external URL*,
or create an *E52 String *in a KB or within an RDF file, or use a
platform-internal "*BLOB mechanism*" with whatever kind of identifier
the platform refers to the local copy.
Ontologically, it is irrelevant for the intended immaterial content if
the copy is printed or scribbled on a paper or on a digital medium (or
even a Morse sound track), as long as the material form is
unambiguous wrt to the intended content. Both, paper and digital media
can have errors. The CIDOC CRM v7.1 can be printed on paper and in
principle be reentered manually into a file loss-free.
in the KB: We may refer to a paper copy or a removable medium by an
archival identifier.
About C)
Using an archival identifier for a paper copy, a removable digital
medium or a URL for a file on a machine, in all cases the maintainers
of the archive must guarantee that the identifier will be uniquely
connected with the content. Otherwise, using a URL in a KB is simply
inadequate.
The DOI organisation forsees penalties for users that change the
content of a URL associated with a DOI. There is no other solution.
DOI *automatically redirects* from the DOI URI to the guaranteed URL.
The property P190 has symbolic is used to connect a machine-encodable
information object to a KB internal string. *Similarly*, we want to
refer to the content of an information object via an *external*
digital or not copy, via a *URL or archival identifier*. Therefore we
propose the following property:
**New proposal:**
*Pxxx has representative copy*
Domain:
E90 Symbolic Object
Range:
E25 Human-Made Feature
Subproperty of:
E90 Symbolic Object. P128i is carried by (carries): E18 Physical Thing
Quantification:
many to many (0,n:0,n)
Scope note:
This property associates an instance of E90 Symbolic Object with a
complete, identifying representation of its content in the form of a
sufficiently readable instance of E25 Human-Made Feature, including,
in particular, representations on electronic media, regardless whether
they reside internally in clusters of electronic machines, such as in
so-called cloud services, or on removable media.
This property only applies to instances of E73 Information Object that
can completely be represented by discrete symbols, in contrast to
analogue information. The representing object may be more specific
than the symbolic level defining the identity condition of the
represented. This depends on the type of the information object
represented. For instance, if a text has type "Sequence of Modern
Greek characters and punctuation marks", it may be represented in a
formatted file with particular fonts on a particular machine, meaning
however only the sequence of Greek letters. Any additional analogue
elements contained in the representing object will not regarded to be
part of the represented.
As another example, if the represented object has type "English words
sequence", American English or British English spelling variants may
be chosen to represent the English word "colour" without defining a
different symbolic object.
In a knowledge base, typically, the represented object will appear as
a URI without a corresponding file, whereas the representing one may
appear by the URL of a binary encoded file existing outside the
knowledge base proper, or by the archival identifier of a paper
edition. A URL for identifying the copy itself in a knowledge base
should only be used as long as the providers support the persistence
of that copy under this URL, as it is current practice for "Linked
Open Data". Associating the referred copy with a checksum in the
knowledge base may help safeguarding the maintainers against
unexpected change of content under this URL. If more than one
representative copy is referred to, the maintainers should control
their mutual consistency at the symbolic level of the object intended
to be represented.
Examples:
Definition of the CIDOC Conceptual Reference Model Version 7.1.1 (E73)
/has representative copy/ The content under
https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf
(E25) on the sever of ICS-FORTH in Heraklion, Greece.
[The edition 7.1.1 of the CIDOC CRM is registered under the public URI
"https://doi.org/10.26225/FDZH-X261",
<https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf>which
redirects users to the representative copy under
https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf.
<https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf>ICS-FORTH
as organisation is responsible for the persistence of this content
under the respective URL to the DOI Foundation]
-----------------------------------------------------------------------------------------------------------------------------------------------------
*Note *that the MS Word copy AND the pdf copy of the CRM is regarded
to be copies of an *identical symbolic content, *the one we are
interested in!
A *vocabulary of symbolic levels *is still to be defined!
IF an instance of *E73 Information Object *is referred to in a KB via
*a (persistent) URL*, I would regard this as a compression of URI -
Pxxx - URL. This practice would not allow for the distinction between
bitwise identity or higher symbolic form.
Partners of this homework please comment if I have missed something!
Best,
Martin
--
------------------------------------
Dr. Martin Doerr
Honorary Head of the
Center for Cultural Informatics
Information Systems Laboratory
Institute of Computer Science
Foundation for Research and Technology - Hellas (FORTH)
N.Plastira 100, Vassilika Vouton,
GR70013 Heraklion,Crete,Greece
Vox:+30(2810)391625
Email:mar...@ics.forth.gr
Web-site:http://www.ics.forth.gr/isl
_______________________________________________
Crm-sig mailing list
Crm-sig@ics.forth.gr
http://lists.ics.forth.gr/mailman/listinfo/crm-sig