hi cristoph,

"That looks like a very useful tool. Out of curiosity: how are the
ontologies/vocabularies loaded?"

i did not actually work on that piece so don't know the details. i do know that it is still a work in progress, i've cc'd ravi who is the one who wrote that code (the UI for the application is written in adobe's flex air)

"Yes, the HGNC Gene Symbols are stable, but what about other species?"

good point, that's where i've seen entrez accessions usually used as the most definitive source.

"So entrez accessions are the 'standard' input format for genes?"

this i've seen as more a de facto best practice than anything for identifying a sequence (but i'm not sure what you mean by a standard input format). it seems to work well for identifying a sequence/gene for the purpose of communicating things such as gene lists, the database is well curated.

as we have all noticed 'what is a gene' is not something that has a definitive answer. i'm currently working on a personal project where the pragmatic answer for 'what is a gene' is simply the 'cross referencing set of sequence identifiers'. by this i mean where i download the information from HGNC, i get cross references to a number of other sequence databases such as entrez. when i parse an ADF file i get additional cross references from the local identifiers to entrez, genbank, etc, and so on. the danger of this, of course, is if there is a bad cross reference that associates, say, two different gene symbols. it is also important to distinguish what cross references are to a broad category (GO symbols for instance) that shouldn't be used. part of the project is to explore this idea and its ramifications.

this leaves to others more knowledgeable the actual identification work of genes and proteins and their relationship.

cheers,
michael

----- Original Message ----- From: "Christoph Grabmuller" <grabm...@ebi.ac.uk>
To: "mdmiller" <mdmille...@comcast.net>
Cc: "M. Scott Marshall" <mscottmarsh...@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org>
Sent: Tuesday, November 09, 2010 2:09 AM
Subject: Re: [BioRDF] Comments from Christoph Grabmuller on BioRDF microarray provenance


On Mon, Nov 8, 2010 at 4:02 PM, mdmiller <mdmille...@comcast.net> wrote:
2) Many 'things' are represented as strings (e.g. genes), which makes
it often impossible to run a federated query against another endpoint.
Gene names might somewhat consistent for HUGO, but what about other
species? Also, just the simple variance between 'STEAP2' and 'Steap2'
makes a (direct) federated query impossible.

* actually, HGNC Gene Symbols and entrez accessions are very stable. for
ArrayExpress, the ADF file will usually map to one or both of these
identifiers. in practice, i've not seen this to be a problem but for the
paper we didn't go far enough.
--mm

Yes, the HGNC Gene Symbols are stable, but what about other species?
So entrez accessions are the 'standard' input format for genes?

And even with HGNC it's not always that easy. Let's say I want to ask
bio2rdf what the uniprot accession is for the symbol 'CFTR':
http://bio2rdf.org/uniprot:P13569 only contains 'CFTR_HUMAN' and
matching that with 'FILTER regex()' is highly impractical across so
much data.
-cg

3) I like the Excel to RDF converter, but it relies on the user
entering correct namespaces, names and database ids from various
places in a syntactically correct way. This requires knowledge of the
correct databases to choose and the 'correct' uri (many variants to
chose from).
If people just enter strings we are not all that far away from MAGE-TAB.

* i'm involved in an open source project, Annotare, that seeks to put a nice
UI on top of creating MAGE-TAB documents for a bench scientist. part of
that is use of the NCBO tools to make it easy for the creator of the
document to go fetch the appropriate term from the appropriate
onotlogy/vocabulary. version one has support for EFO built-in, one of the
main goals for version 2 is to make this much easier and much broader.
--mm

That looks like a very useful tool. Out of curiosity: how are the
ontologies/vocabularies loaded?
-cg



Reply via email to