[mdb-users] tentpoles of Muldis L - my genealogy database project

Darren Duncan Fri, 05 Aug 2011 22:03:15 -0700

Hello,

I'm writing this message to 3 database/research mailing lists that I'm on,although it is actually following up topics discussed during the last few dayson perl-ged...@perl.org. However, I've made this self-contained enough that youshouldn't need to have read those first, though it might benefit you.

I'm going to take a few minutes now and give a bit of flesh to what I said a fewdays ago, in regards to a database project I started designing in 1998 which,while not specific to genealogy, is intended to be vastly better at genealogythan any genealogy-specific data format or program I am currently aware of.

Consider this project to be relatively high-level or domain-specific withrespect to databases, such as a genealogy-specific database would be.

This higher-level project pre-dates my work on, and has presently stalled infavor of, "Muldis D" and my newer relational database projects, which are lowerlevel and have a different target user base. That said, one reason for thestall is that I intend to implement the higher-level project over Muldis D;while strictly speaking it doesn't need to be, I think it would be better servedby that than by SQL or something else.

"Muldis L" is the name that I will use now to refer to thehigher-level/genealogy project with.

I made up that name right now so you're the first that has heard of it, andthere's a reasonable chance I'll actually use it. I probably had, but forgot,some other name for the project before. The "Muldis" is a common brand for allmy projects, which is a contraction of "Multiple Universe of Discourse", veryappropriate for database projects or philosophy. The "L" is the descriptivepart and is short for "Logos" which means knowledge. The name will always beused in full, so the fact some other things are named "L" should be of noconsequence. I could also call it "Muldis Logos". Or I could use both, withthe shorter name being the name of the data interchange format I propose, andthe longer name being the reference implementation program that uses it. I amopen to picking a different name (but that it must start with "Muldis") ifpeople want to propose such at any time. (I rejected "Muldis R" as ambiguous.)

Muldis L is focused on research data in the abstract, whether genealogical orhistorical or scientific or legal, basically in any context where people careabout accuracy. It is aimed at professionals, who should be best able toappreciate what are its main or semi-unique features, but laypeople should alsobenefit from it. It is also aimed at creative types who are fashioning newrealities for storytelling, to help organize details of their worlds. It isalso aimed at archivists and librarians for cataloging their materials. It isintended to be better than any tool specific to any of these tasks.

I am sharing these design ideas with the best academic or open source intent, soyou can benefit from them in projects you might make before mine gets going.And so please give me appropriate credit, should you adapt any of these ideaseither because you learned them from me or because you consider them novel.

Muldis L has several tentpoles, which are features or qualities that are amongwhat I consider the most important for it to have, and that due to centering onthem helps to distinguish the project from others.

1. One tentpole, previously mentioned, is that the database is conceived not asrecording actual facts, but rather assertions or statements. We are nevercompletely sure that something is true or false, but rather that just there isagreement or not. So the database is not saying "this is true", but rather itsays "X says Y and W says V and so we (the database) say "M is N", where some ofthose letters may be the same. A good research database needs to be able tostore and organize contradictory information, such as a parent being youngerthan their child, rather than throwing up its hands and saying it is an error.As you probably know, in real life there can be many sources for relatedsubjects, and it is often the case that they may contradict each other. We needto be able to record all sources and what they say, even if they don't agree.Moreover, we can organize sets of assertions that are likely to be quitedifferent or contradictory but we care in the present, such as witnesses in acourt case, so we can differentiate what each witness says from the case historythat we would piece together and treat as a running assumption.

2. One tentpole is that the data structure is recursively self-referential,with respect to data and citations for that data. Every bit of data in Muldis Lis a statement of perceived fact according to, or an assertion by, some source,and that includes any recorded details about the sources themselves. So, forexamples, that a person June exists is one cited detail, and each of theirdetails such as their name or birthdate or home address or whatever isseparately cited with one or more sources (they can be collected forefficiency). Now say that a source for information on June is a book on thehistory of the Summers family. Details on that book such as its title orauthors or publication date or its own cited sources are also details that ourdatabase individually cites; so we record, this is the book's title, this iswhat it says, and this is what it cites as its own sources, and so on. Now wemay not actually have a copy of that book ourselves; we may just know about itbecause we have some review of that book in our hands, and so everything weclaim for the book to have said, and by extension about June Summers, isconnected to the book review. In other words, Muldis L is really a model of "hesaid she said they said ...", no more and no less. See also tentpole #3.

3. One tentpole is that we have at least 3 main categories of sources or sourcecitations. The first and the one that you'd normally think of is the externalsource, such as any artifact you happen to have or person you interviewed etc,basically what one typically thinks of as a source. The second, a category ofits own, is "first-hand experience"; this is where the user of Muldis Lthemselves is asserting that they personally witnessed what they are asserting,and so it should be considered trustworthy on that basis alone. The third is"hearsay", where the Muldis L user asserts that something is probably true, butthat they neither witnessed it themselves nor recall where they learned of itfrom; this is also valuable to record but should be distinguished. While thefirst of these 3 categories tends to form a middle node in a graph of citations,the latter 2 categories tend to form leaves, at least initially. Of course, ifsome other Muldis L user cites the Muldis L dataset of the first user, then whatwas a leaf becomes a middle in the new dataset. So this is how tentpole #2avoids being infinitely recursive.

4. One tentpole is that Muldis L users can define any entities and attributesand relationships that they want to talk about in their database. If theychoose to have entities like "person" or "marriage" or "city", attributes likename/birthdate/etc, and relationships like "person participates in marriage" or"person lives in city", or "person is a sexual child of person", and so on, thenthey're representing genealogical data. And so we don't just have certainpre-defined fields such as GEDCOM where anything else you want to say has to beshoved in a generic comments field or such. And you avoid import compatibilitysince every possible tag that a GEDCOM variant may have is nativelyrepresentable in the Muldis L database.

5. One tentpole is that entities are abstract to the point that they can alsobe defined as sets of other entities, or rather the set/member is a relationshipbetween 2 point entities, or associated temporally and not just spatially. So,for example, you could be gathering data on what you or a source initiallybelieve to be 2 distinct people, and then you later discover that they are 2aliases for the same person, or vice-versa. So then you could have aperson-like entity for the whole person, and separately for each alias. Or, forexamlpe, you could have a separate entity for the same person at a differentstage in their life, so essentially a temporal union, maybe one describing themas a child and another as an adult, or one describing them before and after theychanged their legal identity. Or, for an entity that can have multiple physicalforms or distinct public and private identities, each can be described as adistinct entity, but they are related by transformation.

There are some other important attributes, but I think I've hit on most of theprimary ones above.

I look forward to getting to implement this after I'm further along on Muldis D/ relational projects, themselves independently useful, and can move on.


Thank you.

-- Darren Duncan
_______________________________________________
muldis-db-users mailing list
muldis-db-users@mm.darrenduncan.net
http://mm.darrenduncan.net/mailman/listinfo/muldis-db-users

[mdb-users] tentpoles of Muldis L - my genealogy database project

Reply via email to