Hello everyone,

sorry that I could not make it to the recent Telecons.

---
Summary and status of the tasks for the parkinson's disease demo we planned 
during the F2F (in my understanding):
---

* Convert Senselab/NeuronDB [1] to RDF (done by Kei and his group). STATUS: 
almost done. However, when viewing the OWL file in Protege at the F2F I was 
still missing a lot of the data that is available on the NeuronDB website -- it 
seemed that the file consisted only of classes and dummy instances, but not 
relations (which are the most important thing we can derive from NeuronDB). 
Maybe Kei could shed light on this issue?

* Debug the Senselab/NeuronDB OWL file. STATUS: ?

* Convert PDSP KiDB to OWL (done by myself). STATUS: done.

* Debug the PDSP KiDB OWL [2] file with pellet. STATUS: almost done. A single, 
elusive error remains, but this will soon be found. Pellet really is a great 
aid in debugging OWL -- at least much better than Protege (thanks for the tip, 
Alan).

* Convert MeSH [3] to OWL (done by myself). STATUS: done, already available as 
a SKOS file from [4].

* 'Convert' Pubchem in order to yield the relation between a CAS number from 
the PDSP KiDB with concepts from MeSH. This is more problematic than I thought. 
At the F2F, I made the suggestion use Pubchem only to extract the relation 
between CAS number and MeSH annotations. Some people also suggested that as 
much as possible from Pubchem should be converted or made accessible via 
wrappers. However, I think I did not stress enough that this would be a quite 
demanding task, as Pubchem is not only quite complex, but also very large - the 
XML export of Pubchem has hundreds of gigabytes. Furthermore, it seems that the 
static exports available via the FTP site of Pubchem do not contain all of the 
necessary information (e.g. MeSH annotations) - these are only contained in 
files that are the results of a search.
Therefore, I would still suggest to focus on simply extracting the CAS number - 
MeSH relation. I would also suggest that conversion should be limited only to a 
small, selected set of records that are useful for the demonstration.
STATUS: I queried the 'Pubchem Substance' database with the searchstring 
'parkinson OR antiparkinsonian OR huntington OR dyskinesia OR hallucinogen OR 
neurotoxic OR serotonin OR dopamine OR glutamate', which gave over thousand 
results. These results were saved as XML. XQuery was used to extract the CAS 
number - MeSH relations from the resultset. Unfortunately, the end result 
turned out to be less useful than expected. This is partly caused by the fact 
that the metadata scheme of the Pubchem exports is not very concise, e.g. the 
MeSH terms are mixed with other kinds of annotations and they are represented 
as strings (e.g. 'ANTIPARKINSONIAN AGENTS') and not as MeSH - IDs. Very 
annoying.
I will continue to explore the data in Pubchem, but the first explorations were 
a bit disappointing. I hope I will find more useful results, otherwise we would 
need to re-think the structure of the demonstration a bit.

* Dissemination of the results, query mechanism, website and interface for the 
demonstration. STATUS: nothing done yet. I would suggest that for the time 
being, we should try to make a coherent semantic network out of all data 
sources and put it in a single triplestore. When this seems to work, we should 
try to simulate a distributed environment, where each datasource and the 
mappings between datasources is located on different SPARQL endpoints that can 
be queried via federated SPARQL queries. Many persons at the F2F (Vipul and 
others) suggested to use another solution that uses a federated query based on 
the Parkinson seed ontology, without requiring a mapping of the original data 
sources. The query algorithms would have to be written by our group and would 
give the user only limited possibilities for making queries (at least that was 
my understanding of the issue, please correct me if I am wrong). This will 
probably lead to a heated discussion in a few months.


kind regards,
Matthias Samwald



[1] http://senselab.med.yale.edu/senselab/
[2] http://pdsp.med.unc.edu/pdsp.php
[3] http://www.nlm.nih.gov/mesh/
[4] http://neuroscientific.net/index.php?id=download


Reply via email to