Re: Nature: A call for a public gene Wiki

Matthew Cockerill Wed, 08 Feb 2006 13:20:45 -0800


Eric,

You raise important issues on this. But I take the converse position,I think:


My sense is that:

(a) wiki style maintenance will prove to be just as valuable forontologies and semantically tagged information as it has proved tobe for hypertext encyclopedia entry maintenance(b) attempts to capture the benefits of a wiki-style approach whileavoiding the core 'anyone can contribute and see their changesimmediately without having to go through moderation' miss the factthat that very thing is central to the motivation that makes wikis(especially wikipedia) work so well (and vastly better then mostpeople would guess they might).



To comment on some of your points:

We are currently exploring various strategies to encourage peopleto let us know when they find errors or omissions in UniProt, oreven to contribute data as they publish their research, rather thanwaiting for a curator to pick up their results from a publication.
In principal all of this has been possible for a long time: We havefeedback forms etc, but people don’t make use of these often (ornot as often as we would like…). The most frequent requests arefrom people who have published a paper and would like us to cite them.

I agree that people don't use feedback forms - and I believe the keyto that it the lack of motivation, and lack of assurance that thefeedback they send will be listened on and acted on in an appropriateway to justify the time it takes.

The most effective improvement, in my opinion, may be to allowpeople to directly attach comments to each database entry, like ina blog (simple, instant gratification etc). These comments couldthen be reviewed by curators, and integrated into the database, ifappropriate.

Adding comments does not provide the same motivation as updating thecore data.

The Encyclopedia Brittanica, with the ability to add comments, isstill the Encyclopedia Brittanica.In many cases, no one will bother to make comments, in other casesthere may be pointless comment debates and comment spam which no onehas time to read. Comments are fine (you can make them on BioMedCentral articles - e.g. http://www.nutritionj.com/content/4/1/24/comments ), but they aren't paradigm changing.

Having a system where people could add and update data directly, onthe other hand, wouldn’t be practical: The amount of trainingrequired to enter data in a consistent manner is considerable, andbeing consistent is essential for large, highly structured datasets like UniProt.

But what would motivate someone to take the time to update an entryin an erroneous and inconsistent way.


The only reason someone would generally do this, would be that

(a) the item was lacking in all annotation up to then, and sosomething is better than nothing

(b) the previous annotation was worse
(c) they are clueless, and have too much time on their hands

The way things seem to work on wikipedia, in all but a fewpathological cases, annotations seem to converge towards surprisinglygood quality.The process by which all parties interested in a particular item(or family of items) can and do sign up to be automatically notifiedof changes to that item seems to be a highly effective way to makesure that changes that get made make sense (and (c), when it occur,is rectified).

Another approach is to have a hand-picked list of experts who areresponsible for certain database entries, according to their areaof expertise. These people would be responsible for letting us knowif something needs to be updated, though I wonder how many peoplecan be motivated to commit themselves to such a thing.The critical factor in the end may not just be how easy it is tocontribute, but also how much credit can be gained from doing so.Contributors should be listed on each page. Should we go as far asattributing individual facts to contributors? This would allow usto also state who disagrees with something. Should we allow peopleto rate the contributions of others? This way people could gainreputation through our web site. Somehow I suspect thatcontributing to public databases like UniProt won’t become commonpractice until this is something that you can proudly mention inyour CV…



I agree that the motivation issue is absolutely  key.

Central to what makes Wikipedia work is not simply that you canchange it, but that it is a highly useful resource, used by millionsof people. All the contributors are users too.

It seems to me that the key to motivation is the ability to yourselfmake changes which increase the usefulness of the resource toyourself and to others.Using a resource if it contains an inaccuracy is a motivation to fixthat resource if you can do so directly. To some extent Wikipediaworks because a significant fraction of the world has OCD tendencies.If they see something out of place, and can change it, they willbecause it makes them feel better, and leaves the world a tidierplace ( sending a feedback message simply cannot provide this levelof reward).


But perhaps more importantly, there are also practical motivations too.

Say that I want to link people from my website to a good, standardexplanation of (say) what an Impact Factor is. I can link to wikipedia:

http://en.wikipedia.org/wiki/Impact_factor

But say that the explanation on the site makes a mistake, or omits akey aspect, or a certain link. I'm motivated to improve the wikientry before I link to it.

The same applies to biologists and bioinformaticists working withUniprot type data - if there is noise in the data (or missing butvital info in that data), and this means that their automatedanalyses are missing things, then if it is possible to clean up thedata at source, there is an immediate motivation to do so.

Another example of how motivation can drive good curation fromgrassroots, that would be impractical in scale if approached from thetop down, imagine we have wiki entries for all scientific authors(not just http://en.wikipedia.org/wiki/Einstein but everyone who'sever published a scientific author, generated from the literatureusing automated statistical tools).This could be a really handy resource - not least, a URI for anyauthor for semantic web purposes.

And suppose that you are John Smith, and you discover that you'vebeen lumped in with another John Smith on the same URI because youshared a name, and the statistical analysis tools couldn't spot thatyour work and career was distinct from the other JS (your doppelganger).Assuming that this wiki database of scientific authors and theircareers and bibliographies is highly used, you (and/or the other JohnSmith) would be strongly motivated for practical reasons todisentangle your identities into separate wiki pages. And as a resultof doing so, you would be adding additional training data for thealgorithms, that could then be used to improve the statisticalanalysis next time around.


One a related issue:
Pierre wrote:
"a wiki is not a "semantic web" source of information"

My sense is that, to take one example, Wikipedia is a lot closer to a'semantic web' source of information than is commonly acknowledged.For a start, unlike, say, the Gene Ontology, there is a clearlyagreed URI for each concept/entry within Wikipedia.

e.g.
http://en.wikipedia.org/wiki/Gambia
http://en.wikipedia.org/wiki/France

Admittedly, although those entries link to:
http://en.wikipedia.org/wiki/Country
and
http://en.wikipedia.org/wiki/Population

Wikipedia (I think) currently lacks the expressive power to expresseven simple "is a" or "has a" relationships.But it has the necessary building blocks to make such a thing, andmore complicated ontology management possible.

Why do I keep mentioning Wikipedia, rather than proposing a new Wiki-semanto-pedia?Because I think, just like with the success of Google and Ebay,motivating people to update content is an example positive-feedbackcreating a winner-takes-all.The more wikipedia is used, the more people are motivated to updateit, and the more useful it gets.

If it is possible to give people comprehensible tools to allow themto express (and manage in a wiki way) semantic web relationshipswithin wikipedia at the same time as human readable text, then Ithink there is finally a chance to turn the whole semantic web dreaminto something practically and realisticly attainable, and thatwikipedia itself may play an important role in that. After all,there's no quicker way to look up the URI for a given concept thanto do a quick search of Wikipedia from your firefox searchbox...


Matt



On 8 Feb 2006, at 19:44, Eric Jain wrote:


Pierre LINDENBAUM wrote:

I agree, a wiki would be great way for sharing
knowledge as it would allow experts of a protein, of a
gene to freely add, modify and share annotations. But
I fear it could also be a problem for knowledge
discovery  because a wiki is not a "semantic web"
source of information.

I'm also a bit skeptical about how well a wiki would work here, seehttp://eric.jain.name/2006/02/08/how-to-encourage-contributions/.

Re: Nature: A call for a public gene Wiki

Reply via email to