Re: [BioRDF] Comments from Christoph Grabmuller on BioRDF microarray provenance

mdmiller Tue, 09 Nov 2010 08:47:11 -0800

hi cristoph,

"That looks like a very useful tool. Out of curiosity: how are the
ontologies/vocabularies loaded?"

i did not actually work on that piece so don't know the details. i do knowthat it is still a work in progress, i've cc'd ravi who is the one who wrotethat code (the UI for the application is written in adobe's flex air)


"Yes, the HGNC Gene Symbols are stable, but what about other species?"

good point, that's where i've seen entrez accessions usually used as themost definitive source.


"So entrez accessions are the 'standard' input format for genes?"

this i've seen as more a de facto best practice than anything foridentifying a sequence (but i'm not sure what you mean by a standard inputformat). it seems to work well for identifying a sequence/gene for thepurpose of communicating things such as gene lists, the database is wellcurated.

as we have all noticed 'what is a gene' is not something that has adefinitive answer. i'm currently working on a personal project where thepragmatic answer for 'what is a gene' is simply the 'cross referencing setof sequence identifiers'. by this i mean where i download the informationfrom HGNC, i get cross references to a number of other sequence databasessuch as entrez. when i parse an ADF file i get additional cross referencesfrom the local identifiers to entrez, genbank, etc, and so on. the dangerof this, of course, is if there is a bad cross reference that associates,say, two different gene symbols. it is also important to distinguish whatcross references are to a broad category (GO symbols for instance) thatshouldn't be used. part of the project is to explore this idea and itsramifications.

this leaves to others more knowledgeable the actual identification work ofgenes and proteins and their relationship.


cheers,
michael

----- Original Message -----From: "Christoph Grabmuller" <grabm...@ebi.ac.uk>

To: "mdmiller" <mdmille...@comcast.net>

Cc: "M. Scott Marshall" <mscottmarsh...@gmail.com>; "HCLS"<public-semweb-lifesci@w3.org>

Sent: Tuesday, November 09, 2010 2:09 AM

Subject: Re: [BioRDF] Comments from Christoph Grabmuller on BioRDFmicroarray provenance



On Mon, Nov 8, 2010 at 4:02 PM, mdmiller <mdmille...@comcast.net> wrote:

2) Many 'things' are represented as strings (e.g. genes), which makes
it often impossible to run a federated query against another endpoint.
Gene names might somewhat consistent for HUGO, but what about other
species? Also, just the simple variance between 'STEAP2' and 'Steap2'
makes a (direct) federated query impossible.

* actually, HGNC Gene Symbols and entrez accessions are very stable. for
ArrayExpress, the ADF file will usually map to one or both of these
identifiers. in practice, i've not seen this to be a problem but for the
paper we didn't go far enough.
--mm


Yes, the HGNC Gene Symbols are stable, but what about other species?
So entrez accessions are the 'standard' input format for genes?

And even with HGNC it's not always that easy. Let's say I want to ask
bio2rdf what the uniprot accession is for the symbol 'CFTR':
http://bio2rdf.org/uniprot:P13569 only contains 'CFTR_HUMAN' and
matching that with 'FILTER regex()' is highly impractical across so
much data.
-cg

3) I like the Excel to RDF converter, but it relies on the user
entering correct namespaces, names and database ids from various
places in a syntactically correct way. This requires knowledge of the
correct databases to choose and the 'correct' uri (many variants to
chose from).
If people just enter strings we are not all that far away from MAGE-TAB.

* i'm involved in an open source project, Annotare, that seeks to put anice

UI on top of creating MAGE-TAB documents for a bench scientist. part of
that is use of the NCBO tools to make it easy for the creator of the
document to go fetch the appropriate term from the appropriate
onotlogy/vocabulary. version one has support for EFO built-in, one of the
main goals for version 2 is to make this much easier and much broader.
--mm


That looks like a very useful tool. Out of curiosity: how are the
ontologies/vocabularies loaded?
-cg

Re: [BioRDF] Comments from Christoph Grabmuller on BioRDF microarray provenance

Reply via email to