Adrian Midgley wrote:
Annotated terms are one thing, and so far need a human editor to keep them up to date.
And that last large and oepn-ended part of a large job which itself is of open scope is the area I was thinking of a free-form collaborative tool being useful for. Wikkipedia impresses me, and the thing that I felt was most defficient in the Rea Code project was any sort of narrative about what inividual terms were useful for/would usually be taken to mean.
word lists are another thing. These can be automated. You can enumerate them from source and do a concordance.
But I think there is another interesting approach.
I am sure most of us deal with e-mail 'spam'. Rule based mail filters only go so far to control it. However, much empirical evidence and practical experience exists which suggests that 'self learning' systems based on statistical semantics or bayesian filters achieve better performance than rules in a very short time. (see http://www.paulgraham.com/spam.html "A Plan for Spam")
Apple has included the semantic approach in Jaguar (OS X.2) native mail app. All the unix mail guru's at umich have switched to it!
The bayesian filter is available in open source and will be included in a future release of mozilla. I have seen these systems work, they are amazing. The mail community of the Internet have basically given up on controlling spam at the protocol level, and here comes some simple and elegant client software that does the job!
Using something like this to pipe all your 'medical records of interest' through, will soon learn your particular vocabulary. Who cares if it includes non-medical terms. (you can train it to ignore those anyway!) Make it a small group system and it learns your group's vocabulary.
Such a system is incrementally updated and kept current by the only people who care, the users of it.
