Hello,
I'm writing this message to 3 database/research mailing lists that I'm on,
although it is actually following up topics discussed during the last few days
on perl-ged...@perl.org. However, I've made this self-contained enough that you
shouldn't need to have read those first, though it might benefit you.
I'm going to take a few minutes now and give a bit of flesh to what I said a few
days ago, in regards to a database project I started designing in 1998 which,
while not specific to genealogy, is intended to be vastly better at genealogy
than any genealogy-specific data format or program I am currently aware of.
Consider this project to be relatively high-level or domain-specific with
respect to databases, such as a genealogy-specific database would be.
This higher-level project pre-dates my work on, and has presently stalled in
favor of, "Muldis D" and my newer relational database projects, which are lower
level and have a different target user base. That said, one reason for the
stall is that I intend to implement the higher-level project over Muldis D;
while strictly speaking it doesn't need to be, I think it would be better served
by that than by SQL or something else.
"Muldis L" is the name that I will use now to refer to the
higher-level/genealogy project with.
I made up that name right now so you're the first that has heard of it, and
there's a reasonable chance I'll actually use it. I probably had, but forgot,
some other name for the project before. The "Muldis" is a common brand for all
my projects, which is a contraction of "Multiple Universe of Discourse", very
appropriate for database projects or philosophy. The "L" is the descriptive
part and is short for "Logos" which means knowledge. The name will always be
used in full, so the fact some other things are named "L" should be of no
consequence. I could also call it "Muldis Logos". Or I could use both, with
the shorter name being the name of the data interchange format I propose, and
the longer name being the reference implementation program that uses it. I am
open to picking a different name (but that it must start with "Muldis") if
people want to propose such at any time. (I rejected "Muldis R" as ambiguous.)
Muldis L is focused on research data in the abstract, whether genealogical or
historical or scientific or legal, basically in any context where people care
about accuracy. It is aimed at professionals, who should be best able to
appreciate what are its main or semi-unique features, but laypeople should also
benefit from it. It is also aimed at creative types who are fashioning new
realities for storytelling, to help organize details of their worlds. It is
also aimed at archivists and librarians for cataloging their materials. It is
intended to be better than any tool specific to any of these tasks.
I am sharing these design ideas with the best academic or open source intent, so
you can benefit from them in projects you might make before mine gets going.
And so please give me appropriate credit, should you adapt any of these ideas
either because you learned them from me or because you consider them novel.
Muldis L has several tentpoles, which are features or qualities that are among
what I consider the most important for it to have, and that due to centering on
them helps to distinguish the project from others.
1. One tentpole, previously mentioned, is that the database is conceived not as
recording actual facts, but rather assertions or statements. We are never
completely sure that something is true or false, but rather that just there is
agreement or not. So the database is not saying "this is true", but rather it
says "X says Y and W says V and so we (the database) say "M is N", where some of
those letters may be the same. A good research database needs to be able to
store and organize contradictory information, such as a parent being younger
than their child, rather than throwing up its hands and saying it is an error.
As you probably know, in real life there can be many sources for related
subjects, and it is often the case that they may contradict each other. We need
to be able to record all sources and what they say, even if they don't agree.
Moreover, we can organize sets of assertions that are likely to be quite
different or contradictory but we care in the present, such as witnesses in a
court case, so we can differentiate what each witness says from the case history
that we would piece together and treat as a running assumption.
2. One tentpole is that the data structure is recursively self-referential,
with respect to data and citations for that data. Every bit of data in Muldis L
is a statement of perceived fact according to, or an assertion by, some source,
and that includes any recorded details about the sources themselves. So, for
examples, that a person June exists is one cited detail, and each of their
details such as their name or birthdate or home address or whatever is
separately cited with one or more sources (they can be collected for
efficiency). Now say that a source for information on June is a book on the
history of the Summers family. Details on that book such as its title or
authors or publication date or its own cited sources are also details that our
database individually cites; so we record, this is the book's title, this is
what it says, and this is what it cites as its own sources, and so on. Now we
may not actually have a copy of that book ourselves; we may just know about it
because we have some review of that book in our hands, and so everything we
claim for the book to have said, and by extension about June Summers, is
connected to the book review. In other words, Muldis L is really a model of "he
said she said they said ...", no more and no less. See also tentpole #3.
3. One tentpole is that we have at least 3 main categories of sources or source
citations. The first and the one that you'd normally think of is the external
source, such as any artifact you happen to have or person you interviewed etc,
basically what one typically thinks of as a source. The second, a category of
its own, is "first-hand experience"; this is where the user of Muldis L
themselves is asserting that they personally witnessed what they are asserting,
and so it should be considered trustworthy on that basis alone. The third is
"hearsay", where the Muldis L user asserts that something is probably true, but
that they neither witnessed it themselves nor recall where they learned of it
from; this is also valuable to record but should be distinguished. While the
first of these 3 categories tends to form a middle node in a graph of citations,
the latter 2 categories tend to form leaves, at least initially. Of course, if
some other Muldis L user cites the Muldis L dataset of the first user, then what
was a leaf becomes a middle in the new dataset. So this is how tentpole #2
avoids being infinitely recursive.
4. One tentpole is that Muldis L users can define any entities and attributes
and relationships that they want to talk about in their database. If they
choose to have entities like "person" or "marriage" or "city", attributes like
name/birthdate/etc, and relationships like "person participates in marriage" or
"person lives in city", or "person is a sexual child of person", and so on, then
they're representing genealogical data. And so we don't just have certain
pre-defined fields such as GEDCOM where anything else you want to say has to be
shoved in a generic comments field or such. And you avoid import compatibility
since every possible tag that a GEDCOM variant may have is natively
representable in the Muldis L database.
5. One tentpole is that entities are abstract to the point that they can also
be defined as sets of other entities, or rather the set/member is a relationship
between 2 point entities, or associated temporally and not just spatially. So,
for example, you could be gathering data on what you or a source initially
believe to be 2 distinct people, and then you later discover that they are 2
aliases for the same person, or vice-versa. So then you could have a
person-like entity for the whole person, and separately for each alias. Or, for
examlpe, you could have a separate entity for the same person at a different
stage in their life, so essentially a temporal union, maybe one describing them
as a child and another as an adult, or one describing them before and after they
changed their legal identity. Or, for an entity that can have multiple physical
forms or distinct public and private identities, each can be described as a
distinct entity, but they are related by transformation.
There are some other important attributes, but I think I've hit on most of the
primary ones above.
I look forward to getting to implement this after I'm further along on Muldis D
/ relational projects, themselves independently useful, and can move on.
I am sharing these design ideas with the best academic or open source intent, so
you can benefit from them in projects you might make before mine gets going.
And so please give me appropriate credit, should you adapt any of these ideas
either because you learned them from me or because you consider them novel.
Thank you.
-- Darren Duncan
_______________________________________________
muldis-db-users mailing list
muldis-db-users@mm.darrenduncan.net
http://mm.darrenduncan.net/mailman/listinfo/muldis-db-users