Andrew Jaquith wrote: > You could certainly do that with the package as I've described it -- > the pseudo-subject or facet is what I called a Score "category," if I > catch your meaning.
Hi Andrew, I'm not sure what the correct library term would be, except that if you look into Faceted Classification (FC) there's the concept of a subject, rather than being an enumeration (ala Dewey), is comprised of facets, with the individual facets themselves composite subjects, eventually drilling recursively down until one reaches some idea of a 'core' facet (though it's easy to argue that no such core exists or even could, that facets are themselves subjects and recurse as well). http://en.wikipedia.org/wiki/Faceted_classification You may know that I have been involved in the Topic Map standards work and by some minor coincidence Steve Pepper's last message into the TopicMapMail mailing list is very much in line with this subject- centric thinking, and I've attached it [1] as it is likely informative. So 'category' is fine. I was just trying to think of some term that might describe the components that together build a given subject, with FC being a very useful framework. > I'll take this as a vote of confidence -- when I check it in (in the > next few weeks probably), you'll be able to see the code for yourself. > :) Yes it sounds very promising, thanks! Murray [1] Re: [topicmapmail] Weekly binge : Do we care about subjects that much? Steve Pepper message of 24 September 2009 into the TopicMapMail <[email protected]> mailing list ........................................................................... Murray Altheim <murray09 at altheim dot com> === = = http://www.altheim.com/murray/ = = === SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk = = = = Boundless wind and moon - the eye within eyes, Inexhaustible heaven and earth - the light beyond light, The willow dark, the flower bright - ten thousand houses, Knock at any door - there's one who will respond. -- The Blue Cliff Record
--- Begin Message ---* Patrick Durusau | | Err, actually the TMRM actually says one has to declare how subjects are | identified. No pre-defined mechanisms. | | It might be helpful to think of the basis for TMDM merging as a starter | set of *interchangeable* merging rules. That is you can rely on those to | operate for any software that implements the TMDM. >From a "philosophical" point of view (maybe "cognitive linguistic" point of >view is better), I think the TMDM and TMRM complement each other rather nicely, each providing a mechanism that corresponds to how we as humans solve the subject identity problem. We all walk around with a vast number of concepts up in our heads, each more or less clearly defined. Those concepts are all connected through relationships in an enormous network, and a concept is really nothing more than the sum of its connections. In theory we can refer to concepts via their connections. For example, I could say "the capital of the country in which I was born" and those of you who know me would know what I meant. But this kind of thing is long-winded and awkward, so we go to the trouble of giving *names* to the most salient concepts. It then becomes much easier to convey the concept to others: clearly it's much more economical to say "London" (not to mention less dependent on rather specific encyclopedic knowledge, such as who I am and where I was born). The only problem with names, of course, is that they are not unique. (In the context of American literature, "London" could refer to the author of "The Iron Heel".) But they work pretty well for humans, because we are able to use context to disambiguate. When context-based disambiguation fails, as it sometimes does, we usually discover the miscommunication at some point and are able to back-track and fix the error - using one or more properties to disambiguate. (I remember being with Bernard Vatant in a bar in Austin, Texas when he told one of the locals that he was from Paris. We didn't realize that he had been misunderstood until some time later when the guy said, "Oh you mean Paris, France..." Computers, of course, are not that smart yet, and won't be for a long time.) Subject identifiers are like names: They are simply conventional symbols that are used to stand in for the concepts (subjects) we wish to refer to. The big difference is that they are globally unique and therefore much more suitable for computers than the names we humans use. Subject identifiers are the (primary) mechanism offered by the TMDM for subject identification (subject locators and item identifiers are secondary and only used for special purposes). And just as humans could (in theory) do without names, we could (in theory) do without subject identifiers, and instead base all our subject identification on properties like "born in" and "located in". This is essentially the TMRM approach. Conceptually it works; in practice, it usually doesn't, or at least only in very limited ways. In one sense the TMRM approach underlies what we usually do anyway when we create a subject identifier: we conceptualize some subject in our head (on the basis of all its connections) and then capture just enough of the most salient relationships in the subject descriptor.[1] The TMRM approach is also what we fall back on when we merge topic maps that don't share subject identifiers - we compare properties: social security numbers, email addresses, data codes (in combination with the topic type - another property - so as not to merge, say, Norway with Nordfjörður Airport, both of which have the code "NOR"). But this really is very much a fallback, for the simple reason that the set of properties necessary to identify a subject is usually not present in both topic maps. For example, we could use the following tolog-NG query: MERGE $T1, $T2 FROM instance-of($T1, city), instance-of($T2, city), capital-of($T1 : city, England : country), capital-of($T2 : city, England : country)? But this *only works* if the capital-of assertion is made about both T1 and T2. Only exceptionally will that be the case. Here are the assertions made about London (T1) in one topic map:[2] Birthplace of Bulwer-Lytton, Edward Lord Byron Contains Covent Garden Theatre Hippodrome His/Her Majesty's Savoy Theatre Died here Leoni, Franco Located in England Any of these associations (except the located in association) are what the OWL folks called inverse functional properties* and could therefore form the basis for merging (which, after all, is what subject identity is primarily about). But what are the chances of T2 having one of these associations? Probably very slight. And even if, by some miraculous chance, it did have one of them, you wouldn't know unless you could establish the identity of the associated topic, which might well be referred to by a subtly different name (e.g "Edward Bulwer-Lytton", "George Gordon Byron, 6th Baron Byron", or "Royal Opera House"). So you have a recursion problem on your hands... And it doesn't stop there, because you would also have to establish the identity of "Birthplace of", "Contains", and "Died here" - as well as the corresponding role types - otherwise you wouldn't know if the associations involving T1 and T2 really were identical. In summary, while the TMRM approach might work in limited ways within the confines of a single application, it is doomed to fail in the general case. So, to return to Alex's original posting: | We plug away at our Topic Maps, and I for one claim to think in terms | of subjects, being all subject-centric and all. But am I? I like to | think I am, but there's that ever-nagging feeling that a subject proxy | will never quite be right, and that the compromise of subject locators | / identifiers / indicators is as good as it gets, but not quite | subject centric now, is it? | | Can I ask us all a philosophical question? Apart from the TMDM / TMRM | mechanisms for identity of subjects, what are my alternatives? As far as I'm concerned, subject identifiers - and published subjects - constitute the only really viable alternative. If they sometimes engender a feeling of compromise, it's in the nature of the problem. We are not always aware of it in real life, because things generally "just work", but the truth is that every one of us has a slightly different (and continually evolving) concept of, say, London. The true "subject" is just a fuzzy compromise consisting of the most salient shared properties of all those gloriously varied concepts. But, hey, it works in real life, so why not in Topic Maps as well? I betcha http://psi.ontopedia.net/London would do the trick for 99% of our needs, and for the other 1% we just create additional, more specific PSIs. Steve * "If a property is declared to be inverse-functional, then the object of a property statement uniquely determines the subject (some individual)." http://www.w3.org/TR/owl-ref/#InverseFunctionalProperty-def People can only be born in (or die in) one place, and theatres can only be located in one place, so born in, died in, and located in are sufficient to identify the places concerned. But it doesn't work the other way around: You can't identify a person through the place s/he was born (or died), for obvious reasons. [1] The information at http://psi.ontopedia.net/London is admittedly a bit on the sparse side. The reason is that it was autogenerated from a topic map that did not contain the capital-of information - nor indeed that city would be a more appropriate type. A better example is http://psi.ontopedia.net/born_in. [2] http://tinyurl.com/nsyjwo -- PSI: http://psi.ontopedia.net/Steve_Pepper Blog: http://topicmaps.wordpress.com _______________________________________________ topicmapmail mailing list [email protected] http://www.infoloom.com/mailman/listinfo/topicmapmail
--- End Message ---
