Eric,
You raise important issues on this. But I take the converse position, I think:

My sense is that:
(a) wiki style maintenance will prove to be just as valuable for ontologies and semantically tagged information as it has proved to be for hypertext encyclopedia entry maintenance (b) attempts to capture the benefits of a wiki-style approach while avoiding the core 'anyone can contribute and see their changes immediately without having to go through moderation' miss the fact that that very thing is central to the motivation that makes wikis (especially wikipedia) work so well (and vastly better then most people would guess they might).


To comment on some of your points:

We are currently exploring various strategies to encourage people to let us know when they find errors or omissions in UniProt, or even to contribute data as they publish their research, rather than waiting for a curator to pick up their results from a publication.

In principal all of this has been possible for a long time: We have feedback forms etc, but people don’t make use of these often (or not as often as we would like…). The most frequent requests are from people who have published a paper and would like us to cite them.

I agree that people don't use feedback forms - and I believe the key to that it the lack of motivation, and lack of assurance that the feedback they send will be listened on and acted on in an appropriate way to justify the time it takes.


The most effective improvement, in my opinion, may be to allow people to directly attach comments to each database entry, like in a blog (simple, instant gratification etc). These comments could then be reviewed by curators, and integrated into the database, if appropriate.

Adding comments does not provide the same motivation as updating the core data.

The Encyclopedia Brittanica, with the ability to add comments, is still the Encyclopedia Brittanica. In many cases, no one will bother to make comments, in other cases there may be pointless comment debates and comment spam which no one has time to read. Comments are fine (you can make them on BioMed Central articles - e.g. http://www.nutritionj.com/content/4/1/24/ comments ), but they aren't paradigm changing.



Having a system where people could add and update data directly, on the other hand, wouldn’t be practical: The amount of training required to enter data in a consistent manner is considerable, and being consistent is essential for large, highly structured data sets like UniProt.


But what would motivate someone to take the time to update an entry in an erroneous and inconsistent way.

The only reason someone would generally do this, would be that
(a) the item was lacking in all annotation up to then, and so something is better than nothing
(b) the previous annotation was worse
(c) they are clueless, and have too much time on their hands

The way things seem to work on wikipedia, in all but a few pathological cases, annotations seem to converge towards surprisingly good quality. The process by which all parties interested in a particular item (or family of items) can and do sign up to be automatically notified of changes to that item seems to be a highly effective way to make sure that changes that get made make sense (and (c), when it occur, is rectified).

Another approach is to have a hand-picked list of experts who are responsible for certain database entries, according to their area of expertise. These people would be responsible for letting us know if something needs to be updated, though I wonder how many people can be motivated to commit themselves to such a thing. The critical factor in the end may not just be how easy it is to contribute, but also how much credit can be gained from doing so. Contributors should be listed on each page. Should we go as far as attributing individual facts to contributors? This would allow us to also state who disagrees with something. Should we allow people to rate the contributions of others? This way people could gain reputation through our web site. Somehow I suspect that contributing to public databases like UniProt won’t become common practice until this is something that you can proudly mention in your CV…


I agree that the motivation issue is absolutely  key.
Central to what makes Wikipedia work is not simply that you can change it, but that it is a highly useful resource, used by millions of people. All the contributors are users too.

It seems to me that the key to motivation is the ability to yourself make changes which increase the usefulness of the resource to yourself and to others. Using a resource if it contains an inaccuracy is a motivation to fix that resource if you can do so directly. To some extent Wikipedia works because a significant fraction of the world has OCD tendencies. If they see something out of place, and can change it, they will because it makes them feel better, and leaves the world a tidier place ( sending a feedback message simply cannot provide this level of reward).

But perhaps more importantly, there are also practical motivations too.

Say that I want to link people from my website to a good, standard explanation of (say) what an Impact Factor is. I can link to wikipedia:
http://en.wikipedia.org/wiki/Impact_factor
But say that the explanation on the site makes a mistake, or omits a key aspect, or a certain link. I'm motivated to improve the wiki entry before I link to it.

The same applies to biologists and bioinformaticists working with Uniprot type data - if there is noise in the data (or missing but vital info in that data), and this means that their automated analyses are missing things, then if it is possible to clean up the data at source, there is an immediate motivation to do so.

Another example of how motivation can drive good curation from grassroots, that would be impractical in scale if approached from the top down, imagine we have wiki entries for all scientific authors (not just http://en.wikipedia.org/wiki/Einstein but everyone who's ever published a scientific author, generated from the literature using automated statistical tools). This could be a really handy resource - not least, a URI for any author for semantic web purposes.

And suppose that you are John Smith, and you discover that you've been lumped in with another John Smith on the same URI because you shared a name, and the statistical analysis tools couldn't spot that your work and career was distinct from the other JS (your doppelganger). Assuming that this wiki database of scientific authors and their careers and bibliographies is highly used, you (and/or the other John Smith) would be strongly motivated for practical reasons to disentangle your identities into separate wiki pages. And as a result of doing so, you would be adding additional training data for the algorithms, that could then be used to improve the statistical analysis next time around.

One a related issue:
Pierre wrote:
"a wiki is not a "semantic web" source of information"

My sense is that, to take one example, Wikipedia is a lot closer to a 'semantic web' source of information than is commonly acknowledged. For a start, unlike, say, the Gene Ontology, there is a clearly agreed URI for each concept/entry within Wikipedia.
e.g.
http://en.wikipedia.org/wiki/Gambia
http://en.wikipedia.org/wiki/France

Admittedly, although those entries link to:
http://en.wikipedia.org/wiki/Country
and
http://en.wikipedia.org/wiki/Population

Wikipedia (I think) currently lacks the expressive power to express even simple "is a" or "has a" relationships. But it has the necessary building blocks to make such a thing, and more complicated ontology management possible.

Why do I keep mentioning Wikipedia, rather than proposing a new Wiki- semanto-pedia? Because I think, just like with the success of Google and Ebay, motivating people to update content is an example positive-feedback creating a winner-takes-all. The more wikipedia is used, the more people are motivated to update it, and the more useful it gets.

If it is possible to give people comprehensible tools to allow them to express (and manage in a wiki way) semantic web relationships within wikipedia at the same time as human readable text, then I think there is finally a chance to turn the whole semantic web dream into something practically and realisticly attainable, and that wikipedia itself may play an important role in that. After all, there's no quicker way to look up the URI for a given concept than to do a quick search of Wikipedia from your firefox searchbox...

Matt



On 8 Feb 2006, at 19:44, Eric Jain wrote:


Pierre LINDENBAUM wrote:
I agree, a wiki would be great way for sharing
knowledge as it would allow experts of a protein, of a
gene to freely add, modify and share annotations. But
I fear it could also be a problem for knowledge
discovery  because a wiki is not a "semantic web"
source of information.

I'm also a bit skeptical about how well a wiki would work here, see http://eric.jain.name/2006/02/08/how-to-encourage-contributions/.




Reply via email to