Hello,

I'm writing this message to 3 database/research mailing lists that I'm on, although it is actually following up topics discussed during the last few days on perl-ged...@perl.org. However, I've made this self-contained enough that you shouldn't need to have read those first, though it might benefit you.

I'm going to take a few minutes now and give a bit of flesh to what I said a few days ago, in regards to a database project I started designing in 1998 which, while not specific to genealogy, is intended to be vastly better at genealogy than any genealogy-specific data format or program I am currently aware of.

Consider this project to be relatively high-level or domain-specific with respect to databases, such as a genealogy-specific database would be.

This higher-level project pre-dates my work on, and has presently stalled in favor of, "Muldis D" and my newer relational database projects, which are lower level and have a different target user base. That said, one reason for the stall is that I intend to implement the higher-level project over Muldis D; while strictly speaking it doesn't need to be, I think it would be better served by that than by SQL or something else.

"Muldis L" is the name that I will use now to refer to the higher-level/genealogy project with.

I made up that name right now so you're the first that has heard of it, and there's a reasonable chance I'll actually use it. I probably had, but forgot, some other name for the project before. The "Muldis" is a common brand for all my projects, which is a contraction of "Multiple Universe of Discourse", very appropriate for database projects or philosophy. The "L" is the descriptive part and is short for "Logos" which means knowledge. The name will always be used in full, so the fact some other things are named "L" should be of no consequence. I could also call it "Muldis Logos". Or I could use both, with the shorter name being the name of the data interchange format I propose, and the longer name being the reference implementation program that uses it. I am open to picking a different name (but that it must start with "Muldis") if people want to propose such at any time. (I rejected "Muldis R" as ambiguous.)

Muldis L is focused on research data in the abstract, whether genealogical or historical or scientific or legal, basically in any context where people care about accuracy. It is aimed at professionals, who should be best able to appreciate what are its main or semi-unique features, but laypeople should also benefit from it. It is also aimed at creative types who are fashioning new realities for storytelling, to help organize details of their worlds. It is also aimed at archivists and librarians for cataloging their materials. It is intended to be better than any tool specific to any of these tasks.

I am sharing these design ideas with the best academic or open source intent, so you can benefit from them in projects you might make before mine gets going. And so please give me appropriate credit, should you adapt any of these ideas either because you learned them from me or because you consider them novel.

Muldis L has several tentpoles, which are features or qualities that are among what I consider the most important for it to have, and that due to centering on them helps to distinguish the project from others.

1. One tentpole, previously mentioned, is that the database is conceived not as recording actual facts, but rather assertions or statements. We are never completely sure that something is true or false, but rather that just there is agreement or not. So the database is not saying "this is true", but rather it says "X says Y and W says V and so we (the database) say "M is N", where some of those letters may be the same. A good research database needs to be able to store and organize contradictory information, such as a parent being younger than their child, rather than throwing up its hands and saying it is an error. As you probably know, in real life there can be many sources for related subjects, and it is often the case that they may contradict each other. We need to be able to record all sources and what they say, even if they don't agree. Moreover, we can organize sets of assertions that are likely to be quite different or contradictory but we care in the present, such as witnesses in a court case, so we can differentiate what each witness says from the case history that we would piece together and treat as a running assumption.

2. One tentpole is that the data structure is recursively self-referential, with respect to data and citations for that data. Every bit of data in Muldis L is a statement of perceived fact according to, or an assertion by, some source, and that includes any recorded details about the sources themselves. So, for examples, that a person June exists is one cited detail, and each of their details such as their name or birthdate or home address or whatever is separately cited with one or more sources (they can be collected for efficiency). Now say that a source for information on June is a book on the history of the Summers family. Details on that book such as its title or authors or publication date or its own cited sources are also details that our database individually cites; so we record, this is the book's title, this is what it says, and this is what it cites as its own sources, and so on. Now we may not actually have a copy of that book ourselves; we may just know about it because we have some review of that book in our hands, and so everything we claim for the book to have said, and by extension about June Summers, is connected to the book review. In other words, Muldis L is really a model of "he said she said they said ...", no more and no less. See also tentpole #3.

3. One tentpole is that we have at least 3 main categories of sources or source citations. The first and the one that you'd normally think of is the external source, such as any artifact you happen to have or person you interviewed etc, basically what one typically thinks of as a source. The second, a category of its own, is "first-hand experience"; this is where the user of Muldis L themselves is asserting that they personally witnessed what they are asserting, and so it should be considered trustworthy on that basis alone. The third is "hearsay", where the Muldis L user asserts that something is probably true, but that they neither witnessed it themselves nor recall where they learned of it from; this is also valuable to record but should be distinguished. While the first of these 3 categories tends to form a middle node in a graph of citations, the latter 2 categories tend to form leaves, at least initially. Of course, if some other Muldis L user cites the Muldis L dataset of the first user, then what was a leaf becomes a middle in the new dataset. So this is how tentpole #2 avoids being infinitely recursive.

4. One tentpole is that Muldis L users can define any entities and attributes and relationships that they want to talk about in their database. If they choose to have entities like "person" or "marriage" or "city", attributes like name/birthdate/etc, and relationships like "person participates in marriage" or "person lives in city", or "person is a sexual child of person", and so on, then they're representing genealogical data. And so we don't just have certain pre-defined fields such as GEDCOM where anything else you want to say has to be shoved in a generic comments field or such. And you avoid import compatibility since every possible tag that a GEDCOM variant may have is natively representable in the Muldis L database.

5. One tentpole is that entities are abstract to the point that they can also be defined as sets of other entities, or rather the set/member is a relationship between 2 point entities, or associated temporally and not just spatially. So, for example, you could be gathering data on what you or a source initially believe to be 2 distinct people, and then you later discover that they are 2 aliases for the same person, or vice-versa. So then you could have a person-like entity for the whole person, and separately for each alias. Or, for examlpe, you could have a separate entity for the same person at a different stage in their life, so essentially a temporal union, maybe one describing them as a child and another as an adult, or one describing them before and after they changed their legal identity. Or, for an entity that can have multiple physical forms or distinct public and private identities, each can be described as a distinct entity, but they are related by transformation.

There are some other important attributes, but I think I've hit on most of the primary ones above.

I look forward to getting to implement this after I'm further along on Muldis D / relational projects, themselves independently useful, and can move on.

I am sharing these design ideas with the best academic or open source intent, so you can benefit from them in projects you might make before mine gets going. And so please give me appropriate credit, should you adapt any of these ideas either because you learned them from me or because you consider them novel.

Thank you.

-- Darren Duncan
_______________________________________________
muldis-db-users mailing list
muldis-db-users@mm.darrenduncan.net
http://mm.darrenduncan.net/mailman/listinfo/muldis-db-users

Reply via email to