Eric,
You raise important issues on this. But I take the converse position,
I think:
My sense is that:
(a) wiki style maintenance will prove to be just as valuable for
ontologies and semantically tagged information as it has proved to
be for hypertext encyclopedia entry maintenance
(b) attempts to capture the benefits of a wiki-style approach while
avoiding the core 'anyone can contribute and see their changes
immediately without having to go through moderation' miss the fact
that that very thing is central to the motivation that makes wikis
(especially wikipedia) work so well (and vastly better then most
people would guess they might).
To comment on some of your points:
We are currently exploring various strategies to encourage people
to let us know when they find errors or omissions in UniProt, or
even to contribute data as they publish their research, rather than
waiting for a curator to pick up their results from a publication.
In principal all of this has been possible for a long time: We have
feedback forms etc, but people don’t make use of these often (or
not as often as we would like…). The most frequent requests are
from people who have published a paper and would like us to cite them.
I agree that people don't use feedback forms - and I believe the key
to that it the lack of motivation, and lack of assurance that the
feedback they send will be listened on and acted on in an appropriate
way to justify the time it takes.
The most effective improvement, in my opinion, may be to allow
people to directly attach comments to each database entry, like in
a blog (simple, instant gratification etc). These comments could
then be reviewed by curators, and integrated into the database, if
appropriate.
Adding comments does not provide the same motivation as updating the
core data.
The Encyclopedia Brittanica, with the ability to add comments, is
still the Encyclopedia Brittanica.
In many cases, no one will bother to make comments, in other cases
there may be pointless comment debates and comment spam which no one
has time to read. Comments are fine (you can make them on BioMed
Central articles - e.g. http://www.nutritionj.com/content/4/1/24/
comments ), but they aren't paradigm changing.
Having a system where people could add and update data directly, on
the other hand, wouldn’t be practical: The amount of training
required to enter data in a consistent manner is considerable, and
being consistent is essential for large, highly structured data
sets like UniProt.
But what would motivate someone to take the time to update an entry
in an erroneous and inconsistent way.
The only reason someone would generally do this, would be that
(a) the item was lacking in all annotation up to then, and so
something is better than nothing
(b) the previous annotation was worse
(c) they are clueless, and have too much time on their hands
The way things seem to work on wikipedia, in all but a few
pathological cases, annotations seem to converge towards surprisingly
good quality.
The process by which all parties interested in a particular item
(or family of items) can and do sign up to be automatically notified
of changes to that item seems to be a highly effective way to make
sure that changes that get made make sense (and (c), when it occur,
is rectified).
Another approach is to have a hand-picked list of experts who are
responsible for certain database entries, according to their area
of expertise. These people would be responsible for letting us know
if something needs to be updated, though I wonder how many people
can be motivated to commit themselves to such a thing.
The critical factor in the end may not just be how easy it is to
contribute, but also how much credit can be gained from doing so.
Contributors should be listed on each page. Should we go as far as
attributing individual facts to contributors? This would allow us
to also state who disagrees with something. Should we allow people
to rate the contributions of others? This way people could gain
reputation through our web site. Somehow I suspect that
contributing to public databases like UniProt won’t become common
practice until this is something that you can proudly mention in
your CV…
I agree that the motivation issue is absolutely key.
Central to what makes Wikipedia work is not simply that you can
change it, but that it is a highly useful resource, used by millions
of people. All the contributors are users too.
It seems to me that the key to motivation is the ability to yourself
make changes which increase the usefulness of the resource to
yourself and to others.
Using a resource if it contains an inaccuracy is a motivation to fix
that resource if you can do so directly. To some extent Wikipedia
works because a significant fraction of the world has OCD tendencies.
If they see something out of place, and can change it, they will
because it makes them feel better, and leaves the world a tidier
place ( sending a feedback message simply cannot provide this level
of reward).
But perhaps more importantly, there are also practical motivations too.
Say that I want to link people from my website to a good, standard
explanation of (say) what an Impact Factor is. I can link to wikipedia:
http://en.wikipedia.org/wiki/Impact_factor
But say that the explanation on the site makes a mistake, or omits a
key aspect, or a certain link. I'm motivated to improve the wiki
entry before I link to it.
The same applies to biologists and bioinformaticists working with
Uniprot type data - if there is noise in the data (or missing but
vital info in that data), and this means that their automated
analyses are missing things, then if it is possible to clean up the
data at source, there is an immediate motivation to do so.
Another example of how motivation can drive good curation from
grassroots, that would be impractical in scale if approached from the
top down, imagine we have wiki entries for all scientific authors
(not just http://en.wikipedia.org/wiki/Einstein but everyone who's
ever published a scientific author, generated from the literature
using automated statistical tools).
This could be a really handy resource - not least, a URI for any
author for semantic web purposes.
And suppose that you are John Smith, and you discover that you've
been lumped in with another John Smith on the same URI because you
shared a name, and the statistical analysis tools couldn't spot that
your work and career was distinct from the other JS (your doppelganger).
Assuming that this wiki database of scientific authors and their
careers and bibliographies is highly used, you (and/or the other John
Smith) would be strongly motivated for practical reasons to
disentangle your identities into separate wiki pages. And as a result
of doing so, you would be adding additional training data for the
algorithms, that could then be used to improve the statistical
analysis next time around.
One a related issue:
Pierre wrote:
"a wiki is not a "semantic web" source of information"
My sense is that, to take one example, Wikipedia is a lot closer to a
'semantic web' source of information than is commonly acknowledged.
For a start, unlike, say, the Gene Ontology, there is a clearly
agreed URI for each concept/entry within Wikipedia.
e.g.
http://en.wikipedia.org/wiki/Gambia
http://en.wikipedia.org/wiki/France
Admittedly, although those entries link to:
http://en.wikipedia.org/wiki/Country
and
http://en.wikipedia.org/wiki/Population
Wikipedia (I think) currently lacks the expressive power to express
even simple "is a" or "has a" relationships.
But it has the necessary building blocks to make such a thing, and
more complicated ontology management possible.
Why do I keep mentioning Wikipedia, rather than proposing a new Wiki-
semanto-pedia?
Because I think, just like with the success of Google and Ebay,
motivating people to update content is an example positive-feedback
creating a winner-takes-all.
The more wikipedia is used, the more people are motivated to update
it, and the more useful it gets.
If it is possible to give people comprehensible tools to allow them
to express (and manage in a wiki way) semantic web relationships
within wikipedia at the same time as human readable text, then I
think there is finally a chance to turn the whole semantic web dream
into something practically and realisticly attainable, and that
wikipedia itself may play an important role in that. After all,
there's no quicker way to look up the URI for a given concept than
to do a quick search of Wikipedia from your firefox searchbox...
Matt
On 8 Feb 2006, at 19:44, Eric Jain wrote:
Pierre LINDENBAUM wrote:
I agree, a wiki would be great way for sharing
knowledge as it would allow experts of a protein, of a
gene to freely add, modify and share annotations. But
I fear it could also be a problem for knowledge
discovery because a wiki is not a "semantic web"
source of information.
I'm also a bit skeptical about how well a wiki would work here, see
http://eric.jain.name/2006/02/08/how-to-encourage-contributions/.