existing semantic models for normal conditions of metabolites in bodily fluids?

2016-08-13 Thread Egon Willighagen
Hi all,

I'm extracting some metabolite-disease relationships and the book I'm
reading also lists normal concentrations metabolites in various bodily
fluids for various age groups.

For example, for Phe 0-1 years in serum it lists <80 micromolar
("newborns" is another age group, but most are like "x-y years").

Has anyone encoded such information in a semantic manner already? What
should I be reading?

Looking forward to hearing from you,

grtz,

Egon

-- 
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: -0001-7542-0286
ImpactStory: https://impactstory.org/EgonWillighagen



Re: FDA: Semantic Web Technologies Fellowship

2013-04-08 Thread Egon Willighagen
On Mon, Apr 8, 2013 at 6:50 PM, Michel Dumontier
michel.dumont...@gmail.com wrote:
  Charlie, Eric, Kerstin, and others, keep up your good work to communicate
 the benefits of Semantic Web technologies in simplifying and improving the
 delivery of knowledge across the regulatory and pharmaceutical, clinical and
 health care jurisdictions.

Indeed!

Egon


--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Fwd: [Ops-ttf] Fwd: DrugBank not available?

2012-12-07 Thread Egon Willighagen
Hi all,

 That sounds like a great idea, but only if the metadata itself is maintained, 
 which is generally hard as these things fall into disrepair. For example the
 HCLS LODD entry for the DrugBank RDF [1] claims that it is updated regularly 
 when in reality it has not been updated since 17 November 2008.
 [1] http://www.w3.org/wiki/HCLSIG/LODD/Data

I got this on another mailing list... I think we really need to go
through that list... I guess the Linked Life Data task force is the
most likely group to do that? I will bring this up this Monday...

Egon


--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Can we please stop all those CfPs and use SemWeb technologies instead?

2012-05-29 Thread Egon Willighagen
On Tue, May 29, 2012 at 4:42 PM, Bernadette Hyland
bhyl...@3roundstones.com wrote:
 Perhaps use a mail filter rule that includes CfP or other subject tags so
 you aren't flooded.  I for one find the calls useful and not much of any
 annoyance.

If only that was consistently used... I find a list, such as Michel
pointed out *very* useful, but I also find myself ignoring the list
now, because it is mostly CfPs... :(

I will give filtering a try, and hope that everyone will remember to
add CfP to the subject!

Michel, Bernadetta, thanx for replying!

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Can we please stop all those CfPs and use SemWeb technologies instead?

2012-05-28 Thread Egon Willighagen
Hi all,

my inbox is flooded with call for papers for the many SemWeb
conferences, meetings, special issues, ...

This is silly: can we please use SemWeb technologies for this instead,
so that when I have something to submit, I can just query for upcoming
meetings, issues, etc?

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Predictive Toxicology with Cheminformatics and SemWeb (Fwd: Open postdoc position in Uppsala)

2012-04-18 Thread Egon Willighagen
Hi all,

the Bioclipse team is looking for someone with good semweb skills to
be used in predictive toxicoloy, for a two year postdoc position.

See the announcement below, or follow this link:
http://www.uu.se/jobb/others/annonsvisning?tarContentId=186211languageId=1

Egon


-- Forwarded message --
From: Ola Spjuth ola.spj...@farmbio.uu.se
Date: Wed, Apr 18, 2012 at 6:33 PM
Subject: [OTDev] Open postdoc position in Uppsala
To: opentox development mailing list developm...@opentox.org


Dear all,

I would like to draw your attention to an open postdoc position in
Uppsala which is very relevant to the goals of OpenTox, and would be
happy if you could spread this with your colleagues.

Kind regards,

Ola Spjuth


--
Postdoc position in cheminformatics, bioinformatics, or computer
science - applications in predictive toxicology
We have an open postdoc position in the group of Pharmaceutical
Bioinformatics at the Department of Pharmaceutical Biosciences,
Uppsala University, Sweden.

The successful applicant will conduct research on data
interoperability for predictive toxicology, and especially design and
implement an infrastructure consisting of a database and user
interfaces for data and predictive models in toxicology. Of particular
interest will be to merge chemical and biological data within a
semantic framework, and link toxicity data to genomics and
metabolomics data (toxicogenomics) with a connection to the Bioclipse
framework (www.bioclipse.net). PhD degree or equivalent scholarly
competence in a relevant branch of chem/bioinformatics or computer
science and a strong interest in informatics and data integration is
required. Required competences include web programming, databases, and
working knowledge in Java. Experience with linked data is desirable.

Deadline for application: May 9th , 2012.
Link to job ad and application form:
http://www.uu.se/jobb/others/annonsvisning?tarContentId=186211languageId=1

___
Development mailing list
developm...@opentox.org
http://www.opentox.org/mailman/listinfo/development


-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Fwd: Nature Publishing Group Linked Data Platform

2012-04-05 Thread Egon Willighagen
That they do not do these things yet, sounds like a there are a lot of
opportunities...

Egon
Op 5 apr. 2012 17:41 schreef Michel Dumontier michel.dumont...@gmail.com
het volgende:

 In case you haven't seen, Nature PG now has LOD and a SPARQL endpoint :

 http://www.nature.com/press_releases/linkeddata.html

 unfortunately, after a cursory look ( hope i'm wrong) - i don't think the
 data links into anything on the semantic web... (mesh terms are literals,
 pmids are in NPG's namespace with no links to identifiers.org, etc)

 m.


Nature Publishing Group (NPG) today is pleased to join the linked data
 community by opening up access to its publication data via a linked data
 platform. NPG's Linked Data Platform is available at
 http://data.nature.com.

The platform includes more than 20 million Resource Description
 Framework (RDF) statements, including primary metadata for more than
 450,000
 articles published by NPG since 1869. In this first release, the datasets
 include basic citation information (title, author, publication date, etc)
 as
 well as NPG specific ontologies. These datasets are being released under an
 open metadata license, Creative Commons Zero (CC0), which permits maximal
 use/re-use of this data.

NPG's platform allows for easy querying, exploration and extraction of
 data and relationships about articles, contributors, publications, and
 subjects. Users can run web-standard SPARQL Protocol and RDF Query Language
 (SPARQL) queries to obtain and manipulate data stored as RDF. The platform
 uses standard vocabularies such as Dublin Core, FOAF, PRISM, BIBO and OWL,
 and the data is integrated with existing public datasets including CrossRef
 and PubMed.

More information about NPG's Linked Data Platform is available at
 http://developers.nature.com/docs. Sample queries can be found at
 http://data.nature.com/query. 

 --
 Michel Dumontier
 Associate Professor of Bioinformatics, Carleton University
 Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
 Group
 http://dumontierlab.com




Re: LODD/BioRDF telcon NEXT WEEK (not today)

2012-04-02 Thread Egon Willighagen
On Mon, Apr 2, 2012 at 2:55 PM, M. Scott Marshall
mscottmarsh...@gmail.com wrote:
 Having the next LODD/BioRDF telcon next week (NOT TODAY).
 I will send an agenda for next week's call in the next few days.

Next week Monday is easter?

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: LODD/BioRDF telcon NEXT WEEK (not today)

2012-04-02 Thread Egon Willighagen
On Mon, Apr 2, 2012 at 3:48 PM, M. Scott Marshall
mscottmarsh...@gmail.com wrote:
 In the Netherlands it is a day off, yes. In Europe as well, I believe, but
 not in the U.S. I was hoping that we wouldn't lose too many people.. We can
 try for another day if that's the case.

OK, I'll try to be there too.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: BioRDF/LODD Monday 11AM ET / 3PM GMT / 4PM CET

2012-03-19 Thread Egon Willighagen
Scott,

On Mon, Mar 19, 2012 at 12:19 AM, M. Scott Marshall
mscottmarsh...@gmail.com wrote:
 Tomorrow, a teleconference to discuss efforts to employ RDF for
 expression studies (hoping for a short tour of code from a few
 contributors) and follow up for LODD emerging practices note.

I have teaching at that time tomorrow. Sadly, I did not get around to
proofreading the document I said I would try :(

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: [linkedlifew3cnote] Reminder: LODD telcon today Tuesday Nov. 22 at 11AM EDT (5PM CET)

2011-11-25 Thread Egon Willighagen
Scott,

On Tue, Nov 22, 2011 at 12:36 PM, M. Scott Marshall
mscottmarsh...@gmail.com wrote:
 We are wrapping up a Google Doc version of the W3C note. Please call
 in to discuss it. We will be requesting comments from HCLS after this
 last iteration of edits.
 https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-0e0ixh21_U0y0/edit

I went through the document during the call, and fixed a few typos...
but I also ran into the Q10 section, where we suggest void:license.
Two thoughts here, CC0 which we just mentioned just before is not a
license, but a waiver; However, VoID specifically allows that
combination. However, VoID does not use a license predicate in the
VoID namespace, but reuses DCTerms for that dcterms:license, according
to http://www.w3.org/TR/void/#license. I have updated the text for
that.

Grtz,

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: provenance questionnaire, v2

2011-09-06 Thread Egon Willighagen
On Thu, Sep 1, 2011 at 11:42 PM, Deus, Helena helena.d...@deri.org wrote:
 For those of you who haven’t answered and would like to give your 2c about
 how provenance should be dealt with on the semantic web, here’s your chance!

Authorization would probably not be considered provenance, but I was
wondering if the WG has been talking about that, and if there is an
existing ontology that would be suitable for that, compatible with the
provenance ontology... it's clear that at least the depositors
(provenance) have authorization, so compatibility at that level seems
needed... Or?

Egon


-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: provenance questionnaire, v2

2011-09-06 Thread Egon Willighagen
On Tue, Sep 6, 2011 at 11:18 AM, Deus, Helena helena.d...@deri.org wrote:
 I will forward you concerns to the provenance workgroup.

Well, authorization is going to be a big thing in our EU project...
various reasons for that, social, contractual, political. That's just
the way it is. I can elaborate further on our needs, if the is useful
to the WG.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Database versioning and maintenance

2011-08-03 Thread Egon Willighagen
Hi Peter,

On Wed, Aug 3, 2011 at 1:02 PM, Peter Ansell ansell.pe...@gmail.com wrote:
 On your question about Chembl in Bio2RDF, we currently directly use
 Egon's sparql endpoint to provide access to it, but we can easily
 switch, thanks to the way the server can be configured. If John
 Overington is publishing RDF, (preferably using a SPARQL endpoint and
 scripts so that others can regenerate the RDF if they need to based on
 the raw data), then we should be able to transparently switch Bio2RDF
 to using that dataset, barring unresolvable changes in the dataset
 structure and identifiers.

I am working towards an updated version, based on ChEMBL 10. I will
make this available via the same SPARQL server as now, but also via
Kasabi for those who need a more reliable service (though the
rdf.farmbio.uu.se uptime has been quite OK :).

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Version 1.0 of Bio2RDF and Chembl webapps released

2011-06-30 Thread Egon Willighagen
On Thu, Jun 30, 2011 at 8:58 AM, Peter Ansell ansell.pe...@gmail.com wrote:
 The 1.0.1 version of the Bio2RDF server software has been released on
 Sourceforge. The software is designed to be a Linked Data interface to
 a range of RDF datasources, with the current examples being Bio2RDF
 and Chembl. (There was a small bug fix needed to enable endpoint
 round-robin between 1.0.0 and 1.0.1.)

and

 [2] 
 http://sourceforge.net/projects/bio2rdf/files/chembl-server/chembl-webapp-1.0.1/

Peter, is that app running online somewhere too? What's the link to that?

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: LODD telcon on Wed at 11AM ET / 5PM CET

2011-06-15 Thread Egon Willighagen
Hi all,

I have not been able to find a way to sit behind a desk right now. I
will have internet access, but an unstable one. I'll join via IRC, via
which I will be available to give an update on ChEMBL-RDF. In short,
it comes down to:

1.http://rdf.farmbio.uu.se/chembl/sparql has ChEMBL 09
2. but ChEMBL 10 has just been released, with a lot more data; I'll update soon
3. the new RDF has SMILES + InChI, and links out to
http://rdf.openmolecules.net/
4. the new RDF uses the CHEMINF ontology, see
http://bioportal.bioontology.org/ontologies/1444 which I have
collaborated on, with Janna Hastings (EBI, UK) as main developer, and
also Michel Dumontier

(I earlier already linked out to Bio2RDF)

Egon

On Mon, Jun 6, 2011 at 6:29 PM, M. Scott Marshall
mscottmarsh...@gmail.com wrote:
 Brief update LODD datasets - Anja, Oktie, Egon



-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Good news from the EBI Semantic Web Industry Workshop and LODD members

2011-05-25 Thread Egon Willighagen
On Wed, May 25, 2011 at 12:56 AM, M. Scott Marshall
mscottmarsh...@gmail.com wrote:
 * Egon Willighagen is updating the downloadable RDF version of ChEMBL
 to version 9

This download is available from:

https://github.com/egonw/chembl.rdf

(click the Download button)

What I have not done yet is to update the online SPARQL end point at:

http://rdf.farmbio.uu.se/chembl/sparql/

That is still using ChEMBL 02 (I'll try to update that soon).

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Fwd: [open-science] LODD Hack Session Notes - Is It Open request signatories needed

2011-03-09 Thread Egon Willighagen
Hi Matthias,

On Wed, Mar 9, 2011 at 10:46 AM, Matthias Samwald samw...@gmx.at wrote:
 I'm not sure if clear-cut rules for LODD have been defined. However, many
 people interested/involved in LODD come from commercially oriented companies
 (mostly pharmaceutical companies). Therefore it certainly IS a reason for
 concern if 5 out of 12 datasets disallow commercial use without permission.

Agreed. It may also be relevant to all those research institutes that
also have commercial activities, many of them who have mixed funding
from national and EU projects, but also sell consultancy, etc.

 It would certainly be helpful to convince these data providers of removing
 the NC clause, but it seems unlikely.

Indeed. This is why that latter was supposed to be informative, rather
than requesting dropping that clause. At this moment, I am not aware
that anyone has challenged a company for using data with a NC clause,
but this is bound to happen.

 Looking at the list of datasets with
 NC clauses (including Drugbank, LinkedCT, major parts of SIDER, STITCH), I
 get the feeling that the providers did not choose to include NC clauses on a
 whim.

Agreed.

 I guess the best we can realistically do for these datasets is to
 improve the visibility of these licensing restrictions for people that want
 to use them.

Yes, and that's an actual LODD activity we discussed about half a year
ago, and which was the first half of the work done in the hack
session: just getting clear what the actual terms of use are :) For
three they are unclear, and we will seek clarification for those.
That's the three letters being referred to in Jenny's email.

Egon


-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Fwd: [open-science] LODD Hack Session Notes - Is It Open request signatories needed

2011-03-09 Thread Egon Willighagen
Hej Amrapali,

On Wed, Mar 9, 2011 at 2:31 PM, Amrapali J Zaveri
amrapali.j.zav...@gmail.com wrote:
 To answer the question about the licensing/copyright issue, according to
 WHO, if extracts from WHO website of publication are used for research,
 private study or in a noncommercial document with limited circulation (such
 as an academic thesis or dissertation), then it is allowed to do so without
 seeking permission [3]. WHO encourages the use of its information materials
 for information purposes i.e. when the purpose of the use is to share
 objective information, whether free of charge or for sale. Only if the
 material is to be used for commercial purposes, it requires a license.

First of all, thanx for sending these pointers!

If I read this, however, I am tempted to think that this data is
really not Open. Making the whole available as RDF does not sounds to
me as 'extracts' or 'limited circulation' to me...

 However it does have a copyright notice: Copyright World Health Organization
 (WHO), 2011. All Rights Reserved [4], so maybe this copyright could be
 added.

Yes.

 Also, the dataset is available at this SPARQL
 endpoint: http://db0.aksw.org:8895/sparql and can be downloaded from
 here: http://aksw.org/Projects/Stats2RDF#h13390-5 .
 Hope that is sufficient information regarding the open-ness of the dataset.
 Let me know if any other information is required and suggestions are welcome
 :)

Have you asked permission to 'circulate' the whole of the database?

I think Jenny and I will have to update the letter with respect to
these new details, and I'm really happy you pointed me to [3].

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Fwd: [open-science] LODD Hack Session Notes - Is It Open request signatories needed

2011-03-08 Thread Egon Willighagen
Hi LODD wg members,

Jenny Molloy sent around the results of the hack session with the
science working group of the Open Knowledge Foundation, looking at the
12 data sets listed in the best practices paper.

Of these two were clearly Open (ChEMBL, TCM-GenEdit), and one clearly
not Open (UMLS). Diseasome seems to come from the OMIM data which is
also not Open.

Five databases have a non-commercial clause involved, making it Open
according to the LODD definitions (correct?), but not Open following
the OFKN's standards. The original plan was to set up an informative
package of information explaining why the NC clause causes problems,
but we did not get around to this. From a LODD perspective, this is a
non-issue, as I understood (I have not been around when LODD was
defined).

SIDER and STICH have CCZero components and parts covered by NC, but
the SPARQL end point is unclear in what parts it makes available.

That leaves three datasets where we have not been able to find a clear
licensing/copyright/waiver statement, and for these three letters are
now written, see Jenny's email, to inquire under what conditions those
data sets can be redistributed, which the LODD wg is already doing.
This involves DailyMed, RXNorm, and the WHO-GHO data sets. Input from
those who composed is helpful here. One thing that we want to get
clear is if people can pull the data from the SPARQL end point,
use/modify it, and even redistribute it.

All in all, I think the (2 hour in the end) hack session was
productive, and the licensing information has been updated in CKAN,
where we also identified some wishes for improvements for CKAN, but
that will be brought up on the CKAN mailing list.

Thanx to all who where there and helped iron out licensing unclarities
and helped with the letter.

The final letters are linked to below, and if you helped on Monday
then please do sign!

Egon

-- Forwarded message --
From: Jenny Molloy jcmcoppic...@gmail.com
Date: Wed, Mar 9, 2011 at 1:18 AM
Subject: [open-science] LODD Hack Session Notes - Is It Open request
signatories needed
To: open-scie...@lists.okfn.org


Dear All

We had a very productive hack session on Monday night regarding linked
open drug data. You can see the full notes here:
http://okfnpad.org/sciencewg-loddhack-201103

In summary, we reviewed the openness of several LODD data sets in CKAN
and identified those whose maintainer's should be sent an Is It Open
Data? request. We drafted letters  to send to the World Health
Organisation Global Health Observatory and the maintainers of two
datasets at the US National Library of Medicine:
http://okfnpad.org/sciencewg-who-letter
http://okfnpad.org/sciencewg-rxnorm-letter
http://okfnpad.org/sciencewg-nlm-letter

Before we send them via http://www.isitopendata.org/, it would be
great to get more signatories from the group, so please add your name
to the end of the generic letter on
http://okfnpad.org/sciencewg-loddhack-201103 if you are happy to be
included. Unfortunately, we didn't remind all of the hack session
participants to do this before they left, so if you helped on Monday
then please do sign!

We will be sending the letters on Monday 14th March during a follow up
session, of which more details are to follow.

If there is a group on CKAN, or a general topic area that you feel
would be a good target for future sessions of this nature, then please
let me know!

Jenny



___
open-science mailing list
open-scie...@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/open-science




-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Work/Hack session on LODD / IsItOpenData

2011-02-24 Thread Egon Willighagen
On Sun, Feb 20, 2011 at 7:06 PM, Egon Willighagen
egon.willigha...@gmail.com wrote:
 Please let me know if interested in joining the meeting,

Thanx to the many respondents!

I have set up a preliminary etherpad with details:

http://okfnpad.org/sciencewg-loddhack-201103

Please take notice of those... the meeting is just on hour, so the
focus, I hope, will be on writing 'Is it Open Data?' letters, rather
than begin an informative/planning meeting. It's a true hack session.
The letter may serve as example to other projects.

If you like to join this hack session, you can just add your name and
Skype account to the etherpad.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Reminder: LODD telcon today

2011-02-23 Thread Egon Willighagen
Hej Scott,

On Wed, Feb 23, 2011 at 4:51 PM, M. Scott Marshall
mscottmarsh...@gmail.com wrote:
 Agenda

 Best practices document for mapping life sciences data
 Data updates

I like to bring up the meeting scheduled in March together with the
OKF on looking at the Open Data nature of LODD data sets (and quite
possible others too). I have set up a group at CKAN:

http://ckan.net/group/hclsig_lodd

which currently lists the 12 sets from the Best practices paper.

I can't make the call, but will hang out on IRC and in the GDoc...

Egon



-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Work/Hack session on LODD / IsItOpenData

2011-02-20 Thread Egon Willighagen
Hi all,

on February 16 there was an OpenScience Workgroup [0] meeting of the
Open Knowledge Foundation, where we scheduled a work session for
Monday March 7th at 19:00 GMT, for about an hour, to work on
clarifying the data licensing of LODD data sets, such as those listed
on CKAN. For some data sets the license is clear, others have
non-commercial clauses (which is not considered Open Data), and
others do not specify the terms, or are not open at all. For example,
the below diagram a CAS node which is not open data at all:

http://www.w3.org/wiki/HCLSIG/LODD/Data

In fact, the unclear license issue was one of the referee comments on
the LODD contribution to the thematic issue on RDF in chemistry
(coordinated by Matthias).

Please let me know if interested in joining the meeting,

Egon

0.http://science.okfn.org/

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



CC0 RDF hosting service

2010-12-29 Thread Egon Willighagen
Hi all,

Mark Hahnel has extended his Science 3.0 website with a CC0 RDF
hosting service. The purpose is small data sets, like results from
experiments.

http://www.science3point0.com/opendata/

It is still new, and most certainly not in its final form. Yet, this
should be of such interested to the Open Science community I had to
forward the new here.

The full announcement can be found at:

http://www.science3point0.com/blog/2010/12/29/cc0-rdf-hosting-for-scientists/

With kind regards,

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Research Associate
University of Cambridge
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: LODD Telcon

2010-11-10 Thread Egon Willighagen
On Wed, Nov 10, 2010 at 4:45 PM, Matthias Samwald samw...@gmx.at wrote:
  (keeping in mind that we have a limit
 of only 1500 words for the main section).

I was informed that we should consider this a guide line and are free
to use more words if needed.

Egon


-- 
Dr E.L. Willighagen
Postdoctoral Research Associate
University of Cambridge
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: LODD Telcon

2010-10-27 Thread Egon Willighagen
On Mon, Oct 25, 2010 at 3:32 PM, Susie Stephens
susie.steph...@gmail.com wrote:
 Here's the reminder for Wednesday's LODD telcon.

I'll have to cancel too... birthday... family obligations (forgot
it... well, ignored it, I guess)

Data update:
* no new RDF from my side
* other drug-related data: QM-calculated 3D structures:
http://quixote.wikispot.org/, they will output RDF in their workflow

JChemInf Paper:
* Preliminary Communication is OK: details -
http://www.jcheminf.com/info/instructions/?txt_jou_id=10170txt_mst_id=121837
* I like to see a federated SPARQL query for drug data... but haven't
managed to sit down for that yet :(

Apologies for missing this important call!

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Research Associate
University of Cambridge
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: use of RDF in chemistry

2010-09-09 Thread Egon Willighagen
On Thu, Sep 9, 2010 at 6:27 PM, Susie Stephens susie.steph...@gmail.com wrote:
 Chem2bio2rdf might have some data that they could share with you. Dave Wild
 is the PI for the project.

And he was one of the speakers...

Egon


-- 
Dr E.L. Willighagen
Post-doc @ Uppsala University (only until 2010-09-30)
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



use of RDF in chemistry

2010-09-08 Thread Egon Willighagen
Hi all,

two week ago I organized with Martin Braendle (ETH/Zurich) a 1.5 day
symposium on the use of RDF in chemistry at the American Chemical
Society meeting in Boston, and was very happy that Eric Prud'hommeaux
was there to (re)present HCLS / LODD.

Several slide sets have are now available online [0].

With kind regards,

Egon Willighagen

0.http://egonw.github.com/acsrdf2010/

-- 
Dr E.L. Willighagen
Post-doc @ Uppsala University (only until 2010-09-30)
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



ESW wiki: Semantic Web extensions?

2010-08-21 Thread Egon Willighagen
Hi all,

the ESW wiki is running MediaWiki, but not Semantic MW, right? Could
it be an option to actually get that installed and, umm... eat our own
dog food? Samuel (my former student) working on RDFIO during his
Google Summer of Code project, and it should even be possible to get a
SPARQL end point on the wiki that way:

http://saml.rilspace.org/rdfio-040-released-gsoc-finished

The context was a recent question about HCLS WG participation:

* how many paricipants are there, and from how many different organizations?
* how many of those organizations are academic and how many industrial?

Would be cool to just pull that out from the ESW wiki pages, not?

Egon

-- 
Dr E.L. Willighagen
Post-doc @ Uppsala University (only until 2010-09-30)
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Program for the RDF symposium at the American Chemical Society fall meeting

2010-06-26 Thread Egon Willighagen
Dear all,

it is my great pleasure to present the full symposium program for the
RDF session at the American Chemical Society at the Boston meeting in
August:

http://egonw.github.com/acsrdf2010/

I am excited that on Monday afternoon Eric Prud'hommeaux will present
the work of the LODD working group to the chemistry community. The
symposium contains three half day sessions with topics on computing,
ontologies, and applications, all chemistry oriented. The goal of the
meeting is to get together people using RDF technologies in chemistry,
and the list of talks from around the world shows that this goal has
been reached. The program is diverse and exciting, and I am very much
looking forward to meeting all participants to discuss challenges and
cool solutions.

People interested in joining, can sign up to the meeting mailing list,
linked to on the homepage. Besides that the webpage is in XHTML+RDFa,
the source is also available on GitHub (well, you really download the
source code anyway), allowing people to happily fork, make changes,
and perhaps make the page as triple-dense as is possible.

Hoping to have informed you well,

with kind regards from Uppsala,

Egon Willighagen

-- 
Post-doc @ Uppsala University
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: esw wiki changed to media wiki

2010-04-06 Thread Egon Willighagen
On Tue, Apr 6, 2010 at 11:04 PM, Eric Prud'hommeaux e...@w3.org wrote:
 The long-awaited ESW wiki change from MoinMoin to Media Wiki has
 finally occurred. Media Wiki is the wiki upon which Wikipedia is
 based. It has tons of cool modules and is well-maintained. This
 change affects authentication and spam control.

It is not a Semantic MediaWiki [0], or is it?

Egon

http://semantic-mediawiki.org/wiki/Semantic_MediaWiki

-- 
Post-doc @ Uppsala University
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Last Call for Papers: RDF ACS conference, Boston 2010

2010-03-24 Thread Egon Willighagen
Dear all,

here a quick reminder that in four days, the 28th, is the deadline of
the abstract submission of the ACS RDF conference, as part of the
chemical information (CINF) section of the American Chemical Society.
With this email I like to provide some further information, including
the scope of the meeting, the 2nd Call for Papers (at the end), and
the structure of a typical abstract.

The ACS meetings typically have more than 10.000 participants, though
the number of people attending the CINF symposia during those meetings
is around 100. CINF meetings typically are about chemical information
in general, though drug discovery applications take a prominent place.
Other CINF symposia in Boston include:

* Streamlining systems biology and cheminformatics approaches with
high-throughput screening in lead discovery
* The Emerging Concepts of Activity Landscapes and Activity Cliffs and
Their Role in Drug Research
* Leveraging Modeling and Informatics for Rare and Neglected Diseases
* Where's the good stuff?  Consumer health information social
networking, resources and services
* Biologics and Biosimilars: One in the Same?
* Data-intensive drug design

Symposia of the ACS Spring meeting ongoing right now in San Franciso, include:

* Green Chemistry: Multidisciplinary use of chemical information resources
* Data visualization
* Libraries and large scale digitization initiatives (LSDIs)
* Metabolomics
* The Future of Scientific Publishing

As should be clear, the scope is very broad. I'd say the many HCLS
activities are relevant. The meeting is oriented at getting the
various groups involved in RDF applications in chemical and molecular
sciences together, perhaps as suggested best practices.

Abstracts can consist of a single A4 paper with title, authors and
affiliation, and perhaps a few references to relevant literature.

Further information about the full meeting can be found at:

http://portal.acs.org/portal/acs/corg/content?_nfpb=true_pageLabel=PP_ARTICLEMAINnode_id=2060content_id=CNBP_023925use_sec=truesec_url_var=region1__uuid=70ff72d1-db73-471b-8855-f3b5f1c4fba3

Looking forward to your abstract submissions,

with kind regards,

Egon Willighagen

--
2nd Call for Papers: Semantic Chemistry with the Resource Description
Framework (CINF
Symposium, ACS Autumn 2010) 240th ACS National Meeting  Exposition

We now invite papers for our symposium on the use of the Resource
Description Framework (RDF) technologies in semantic knowledge
representation and data exchange in chemistry at the 240th National
Meeting  Exposition of the American Chemical Society (ACS) in Boston
this fall.

Semantic Chemistry has been around for a while, but is seeing a
revival with the adoption of the Resource Description Framework (RDF)
and matching technologies in chemistry. RDF triples provide a simple
structure that allow data and knowledge alike to be presented in a
single framework. Derived technologies include the capturing of
ontologies with the Web Ontology Language (OWL) and performing queries
with SPARQL. A wide variety of free and open source product make it
easy to set up servers with large amounts of RDF data, while
integration with HTML is available too with RDFa.

The RDF symposium at the 240th ACS national meeting in Boston invites
submissions of talks about the use of RDF in chemistry and
cheminformatics. Topics could include the use of OWL ontologies, OWL
axioms, reasoning and interference, RDF in user interfaces, such as
RDFa in web front ends, visualization, querying systems, and
applications thereof, such as linking data sets, compound
classification, cloud computing, web services, data aggregation,
semantic publishing, and literature mining.

Abstracts may be submitted via http://abstracts.acs.org/ now. You’ll
find the RDF session as part of the CINF division symposiums. The
submission deadline is March 28, 2010. In case of questions, please
email Egon Willighagen at egon.willighagen[A]farmbio.uu.se or Martin
Braendle at braendle[A]chem.ethz.ch.
--


-- 
Post-doc @ Uppsala University
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



2nd Call for Papers: Semantic Chemistry with the Resource Description Framework (CINF Symposium, ACS Autumn 2010) 240th ACS National Meeting Exposition

2010-03-20 Thread Egon Willighagen
2nd Call for Papers: Semantic Chemistry with the Resource Description
Framework (CINF
Symposium, ACS Autumn 2010) 240th ACS National Meeting  Exposition

We now invite papers for our symposium on the use of the Resource
Description Framework (RDF) technologies in semantic knowledge
representation and data exchange in chemistry at the 240th National Meeting
 Exposition of the American Chemical Society (ACS) in Boston this fall.

Semantic Chemistry has been around for a while, but is seeing a revival with
the adoption of the Resource Description Framework (RDF) and matching
technologies in chemistry. RDF triples provide a simple structure that allow
data and knowledge alike to be presented in a single framework. Derived
technologies include the capturing of ontologies with the Web Ontology
Language (OWL) and performing queries with SPARQL. A wide variety of free
and open source product make it easy to set up servers with large amounts of
RDF data, while integration with HTML is available too with RDFa.

The RDF symposium at the 240th ACS national meeting in Boston invites
submissions of talks about the use of RDF in chemistry and cheminformatics.
Topics could include the use of OWL ontologies, OWL axioms, reasoning and
interference, RDF in user interfaces, such as RDFa in web front ends,
visualization, querying systems, and applications thereof, such as linking
data sets, compound classification, cloud computing, web services, data
aggregation, semantic publishing, and literature mining.

Abstracts may be submitted via http://abstracts.acs.org/ now. You’ll
find the RDF
session as part of the CINF division symposiums. The submission deadline is
March 28, 2010. In case of questions, please email Egon Willighagen at
egon.willighagen[A]farmbio.uu.se or Martin Braendle at braendle[A]chem.ethz.ch.

-- 
Post-doc @ Uppsala University
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Fwd: [open-science] Launch of the Panton Principles for Open Data in Science + Is It Open Data?

2010-02-19 Thread Egon Willighagen
Hi LODD friends,

I think the below post is of interest to our recent discussion...

I specially like to point to the service listed down the bottom of the post:

http://www.isitopendata.org/

Egon

-- Forwarded message --
From: Jonathan Gray jonathan.g...@okfn.org
Date: Fri, Feb 19, 2010 at 11:59 AM
Subject: [open-science] Launch of the Panton Principles for Open Data
in Science + Is It Open Data?
To: open-science open-scie...@lists.okfn.org


Hi all,

We're pleased to announce the Panton Principles for Open Data in Science:

 http://blog.okfn.org/2010/02/19/launch-of-the-panton-principles-for-open-data-in-science/
 http://www.pantonprinciples.org/

The Panton Principles were authored by Peter Murray-Rust, Cameron
Neylon, Rufus Pollock and John Wilbanks at the Panton Arms on Panton
Street in Cambridge, UK - with input from the Working Group on Open
Data in Science.

You can endorse the principles at:

 http://www.pantonprinciples.org/endorse

We'd be most grateful for any help disseminating the principles - by
blogging, microblogging, forwarding to relevant colleagues and so
forth.

The 'Is It Open Data?' service, which allows anyone to make and
publicly record enquiries about the openness of (scientific) datasets,
is also now live at:

 http://www.isitopendata.org/

All the best,

--
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://blog.okfn.org

http://twitter.com/jwyg
http://identi.ca/jwyg

___
open-science mailing list
open-scie...@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/open-science



-- 
Post-doc @ Uppsala University
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



LODD: new ChEMBL SPARQL end point, requirements for embedding in LODD?

2010-02-09 Thread Egon Willighagen
Hi all,

as indicated on the LODD call, I had some trouble getting the ChEMBL
SPARQL end point somewhat faster. The end point has move to a new more
powerful Virtuoso server, also with more sane server settings:

http://chem-bla-ics.blogspot.com/2010/02/chembl-rdf-1sparql-end-point.html

The RDF graph links out to Bio2RDF, and the obvious next step is link
to in the LODD network...

However, I do not really seem to find requirements... I assume the
regular LD [0] rules apply? The ChEMBL data is only available as
SPARQL right now, but it a start... (the rest is a bit of PHP
wrapping. BTW, anyone aware of a simple PHP lib for that? It's not
much code, but I rather reuse anyway.)...

Second thing is how to provide links... the InChI is mentioned
(mis-capatalized at [1], which makes me wonder who I should ask for
permission to make (minor) changes to the wiki), but many DBs do not
have URIs which include the InChI, making it rather difficult to link
to that resource in a general, independent way... that is, there is no
[2] matching 
http://www4.wiwiss.fu-berlin.de/drugbank/snorql/?describe=http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00010.

More general, is there a requirement and/or policy on how the
resources should be linked up?

Looking forward to hearing from you,

Egon

0.http://www.w3.org/DesignIssues/LinkedData.html
1.http://esw.w3.org/topic/HCLSIG/LODD/Data/
2.http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/InChI=1/C149H246N44O42S/c1-20-77(13)116(191-122(211)81(17)168-132(221)104(66-113(204)205)178-121(210)79(15)167-123(212)88(152)62-84-39-43-86(198)44-40-84)145(234)185-102(63-83-32-23-22-24-33-83)138(227)193-118(82(18)197)146(235)186-103(65-111(155)202)137(226)189-108(71-196)142(231)182-101(64-85-41-45-87(199)46-42-85)136(225)175-93(38-31-56-165-149(161)162)126(215)174-91(35-26-28-53-151)131(220)190-115(76(11)12)143(232)184-97(58-72(3)4)124(213)166-68-112(203)170-94(47-49-109(153)200)128(217)180-100(61-75(9)10)135(224)188-106(69-194)140(229)169-80(16)120(209)172-92(37-30-55-164-148(159)160)125(214)173-90(34-25-27-52-150)127(216)179-99(60-74(7)8)134(223)181-98(59-73(5)6)133(222)176-95(48-50-110(154)201)129(218)183-105(67-114(206)207)139(228)192-117(78(14)21-2)144(233)177-96(51-57-236-19)130(219)187-107(70-195)141(230)171-89(119(156)208)36-29-54-163-147(157)158/h22-24,32-33,39-46,72-82,88-108,115-118,194-199H,20-21,25-31,34-38,47-71,150-152H2,1-19H3,(H2,153,200)(H2,154,201)(H2,155,202)(H2,156,208)(H,166,213)(H,167,212)(H,168,221)(H,169,229)(H,170,203)(H,171,230)(H,172,209)(H,173,214)(H,174,215)(H,175,225)(H,176,222)(H,177,233)(H,178,210)(H,179,216)(H,180,217)(H,181,223)(H,182,231)(H,183,218)(H,184,232)(H,185,234)(H,186,235)(H,187,219)(H,188,224)(H,189,226)(H,190,220)(H,191,211)(H,192,228)(H,193,227)(H,204,205)(H,206,207)(H4,157,158,163)(H4,159,160,164)(H4,161,162,165)/f/h157,159,161,163-193,204,206H,153-156,158,160,162H2

-- 
Post-doc @ Uppsala University
Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Replies from Bio2RDF about contact with upstream data providers

2010-01-06 Thread Egon Willighagen
Hi all,

happy new year!

On Mon, Jan 4, 2010 at 3:44 PM, Susie Stephens susie.steph...@gmail.com wrote:
 == Agenda ==
  * Open data follow up - all
  * Data update - Anja, Jun, Matthias, Egon

I have had email with Peter Ansell of the Bio2RDF project and
copy/pasted replies below.

***
Are you in contact with upstream providers? E.g. are they aware you
rdf-ied their data?
***

Only in cases where they do not offer licenses that we can use without
telling them as far as I know. Some, like the full NLM pubmed license
require that we ask, so in those cases they know.

***
How do you propagate licenses and copyright? I know you have they
data blobs nicely separated, so no problems with license
incompatibility, but I did not see copyright/license statements
mentioned on the RDF pages (or HTML conversion), nor in the list at [0].
Will copyright/license information be added to that list at [0]?

0. http://sourceforge.net/apps/mediawiki/bio2rdf/index.php?title=Namespace
***

Quite a few of the pages, but not all, have a triple added that
indicates where the license is to be found. We use the
http://creativecommons.org/ns#license predicate to indicate the
license, even if the license is not a CC license. It fits better IMO
than dc:license and definitely better than xhtml:license.

See http://bio2rdf.org/go:345 for an example, with the license
redirect URL http://bio2rdf.org/license/go:345 redirecting in
this case to http://www.geneontology.org/GO.cite.shtml. We do a
redirect to the license because that is the easiest method, not that
we couldn't do it directly. I prefer to have the ability to redirect
licenses based on both the namespace and the identifier, particularly
in the case of SIDER for example, where there are two datasets with
different licenses in the same namespace because that is how it
works.

The current list that is used to autogenerate the license triples,
although it should definitely be expanded, can be found in RDF at
http://bio2rdf.svn.sourceforge.net/viewvc/bio2rdf/trunk/src/war/WEB-INF/base-bio2rdf-providers-licenses-config.n3?view=markup
All of the providers there, insert the static RDF/XML that is defined
at http://qut.bio2rdf.org/query:license, but another query could be
used if there were specific conditions for particular datasets, as
there will be with pubmed soon.

***
Do upstream providers have preferences regarding how you put in the license?
***

Not that I know of in most cases. The 2009 Pubmed License has a few
new provisions though, so there are some cases that have different
providers.

***
Have you talked with upstream providers about changing licenses to
reduce license conflicts?
***

I can understand providers not wanting you providing their actual
information in RDF, but I can't understand them thinking that they can
have control over how people relate their personal datasets to their
information in small amounts. If the linking is major then we could be
in the situation that CAS tried to get into with WIkipedia, with CAS
giving Wikipedia a special agreement.
http://www.cas.org/newsevents/caswikipedia.html What they don't
realise is that WIkipedia releases the information under the same
license so it is totally free from that point on, and CAS cannot go
back on the agreement if anyone can prove that they helped with the
CAS number insertions on WIkipedia.

***
Do all upstream databases provide open/free licensing?
***

I only found three databases that we are currently offering for
download that I will have to check up with Marc-Alexandre about
the license conditions [...].

The majority of the databases seem to have the equivalent of CC-BY-NC
on it, although they don't actually use Creative Commons licenses.

--

Egon

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: LODD Telcon

2010-01-06 Thread Egon Willighagen
On Wed, Jan 6, 2010 at 5:59 PM, Susie Stephens susie.steph...@gmail.com wrote:
 Minutes from the LODD telcon are now available.
 http://esw.w3.org/topic/HCLSIG/LODD/Meetings/2010-01-06_Conference_Call

I heard about the conflict of universities trying to protect
commercial interests *and* use CC-NC material on twitter, and asked
for the link, which is:

http://brains.parslow.net/node/1581

So, the details seem to be in the fact that some universities want to
be qualified as 'commercial'...

On the call there was talk that for-research-only it should be OK, but
I do not believe redistributing still qualifies as
for-research-only...

Egon

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: LODD Minutes

2009-11-25 Thread Egon Willighagen
On Wed, Nov 25, 2009 at 5:56 PM, Susie Stephens
susie.steph...@gmail.com wrote:
 Minutes from today's LODD call are now available.
 http://esw.w3.org/topic/HCLSIG/LODD/Meetings/2009-11-25_Conference_Call

Did my email of yesterday not reach the list?

Egon

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: LODD Telcon

2009-11-23 Thread Egon Willighagen
Hi all,

next Wednesday I unfortunately cannot participate because of family obligations.

On Mon, Nov 23, 2009 at 5:19 PM, Susie Stephens
susie.steph...@gmail.com wrote:
 Here's the reminder for Wednesday's LODD telcon.

I was up for a data update, so will have to do like this... my
introduction to this list is ancient, so before. My background is
cheminformatics and chemometrics (statistics/data analysis on chemical
data). I'm a strong believer in Open Data, Open Source and Open
Standards, and (past) developer of several projects, including
Strigi-chemical (chemistry extension for the KDE desktop search
engine), the Chemistry Development Kit, JChemPaint, Jmol, Jmol, and
several other ones. Right now, I am postdoc in a drug discovery group
at Uppsala University (Prof. Wikberg) and developing the
cheminformatics use at the department, which includes the Bioclipse
workbench.

Proteochemometrics is the main statistical method used in our group,
and model validation is clearly important. This is where RDF comes in:
aggregation of data before model building, and for model validation
afterwards. The latter will preferably be data which is related to the
model, and not really of the same type. RDF is clearly one of the few
methods up to this job.

When I first joined the HCLS mailing list and conf calls, I saw very
much focus on biological data, clinical data, but a lack of focus on
the molecular chemistry behind all, which is actually crucial for the
cheminformatics and proteochemometrics.

So, that more or less defines the area where I contribute to the RDF
activities... the border of molecular data and drug-related
properties.

So far, I have developed an extension for Bioclipse to deal with RDF,
and it currently supports an in memory triple store, SPARQL queries on
the in memory stores as well as on remote SPARQL end points. Like the
most of Bioclipse2, it is scriptable, which allows easy building of
small programs or workflows to integrate RDF into other Bioclipse
extension, including the cheminformatics functionality, but also Jmol.
There is also an R interface, to bridge with statistical modeling.

Last week Friday, I gave a talk about this work at SWAT4LS in
Amsterdam, and my slides are available in my blog [0].

Getting back to the data, I am working on making various unique
molecular property resources available as RDF. This includes the GNU
FDL-licensed NMRShiftDB data, which contains NMR spectra (mostly
carbon-13) used for metabolite identification (think finding
biomarkers). There are also two smaller CC0 data sets, one based on
ChemPedia [1], a new crowd-sourcing endeavor for naming molecules (no
i18n support yet, but requested), and the RDF Open Notebook Science
Solubility project [2], which we described in a Chapter in the recent
Beautiful Data book from O'Reilly.

There are other things I am doing, which include an ontology for
molecular (or QSAR) descriptors, and a RDF equivalent for the
cheminformatics data model used by the CDK. This would, though I am
myself not convinced this is really where we want to go, allow
serialization of full molecular structures as RDF data, though parts
of this may very well be rather useful for XHTML+RDFa for scientific
publication of, for example, organic synthesis papers...

I'd very much like to help get these data sets into the LODD network
(particular the last two, which are easiest because of the CC0
license).

One thing I want to do soon (actually, as part of the SWAT4LS
proceedings paper), is create a data set with CDK-based molecular
similarities. The CDK can calculate various, and this will create a
nice sparse matrix. I'm leaning towards doing the molecules in
DBPedia, but and more than Open to analyse other Open data sets too
(bearing a proper license, or proper Public Domain statement, like
CC0). I'll put up the final script on MyExperiment.org anyway, for
others to analyze other data sets. No ETA for that, though.

An example script downloads molecules from DBPedia and visualizes them
2D in a molecule table [3,4].

I am looking forward to hearing your comments and ideas on this work.

Regards,

Egon

0.http://chem-bla-ics.blogspot.com/2009/11/swat4ls-linking-open-drug-data-to.html
1.http://chem-bla-ics.blogspot.com/2009/11/chempedia-rdf-1-sparql-end-point.html
2.http://chem-bla-ics.blogspot.com/2009/11/open-notebook-science-solubility-sparql.html
3.http://egonw.posterous.com/molecules-in-dbpedia-visualized-with-bioclips
4.http://www.myexperiment.org/workflows/927

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: [hcls] Updated wiki page for HCLS Knowledge Base

2009-10-14 Thread Egon Willighagen
On Wed, Oct 14, 2009 at 11:30 AM, Matthias Samwald samw...@gmx.at wrote:
  that said, I also don't think the final SPARQL end point should be remote 
 at all,

 So where should the final SPARQL end point be located? In a server inside
 the intranet of each organization? On the client side? How should it be
 filled? By crawling linked data resources? Please specify.

The current scientific practice is to set up your input data first,
and then do analysis... I have yet to see any scientist to
differently.

Projecting this to RDF, the input would be a single SPARQL end point.
But since the scientist does want to aggregate and preprocess the data
to his particular wishes and needs, *this* SPARQL end point will be
local, so, yes on the client side.

*How* the scientist will fill this local repository highly depends on
his wishes too. This will likely be a mix of remote SPARQL queries,
RDFa for extracting data from this new journal paper in Nature (...),
some local RDF files (and perhaps a institutional SPARQL, though those
resources seem to be rather unused so far, perhaps because they do not
have SPARQL end points yet), some properties calculated locally and/or
remotely which he needs too, etc. So, yes, by crawling the cloud for
data.

Point is: crawling will and must be a central part of the process. And
as such, both Linked Data spread around the web *and* SPARQL end
points will go hand in hand. But I disagree that SPARQL end points
what we should aim at as data providers, as scientists will never use
it as such anyway.

Just think of it like this: if you aggregated the data already in the
way the scientists wants it, he is no longer doing cutting edge
science (it's already been done!). Yes, analysis goes beyond the
aggregation, but to provide your scientific point, you will provide
counter arguments based on *external* data, hence the crawling...

Egon

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: [hcls] Updated wiki page for HCLS Knowledge Base

2009-10-13 Thread Egon Willighagen
On Tue, Oct 13, 2009 at 4:13 PM, Mark ma...@illuminae.com wrote:
 On Tue, 13 Oct 2009 03:34:01 -0700, Matthias Samwald samw...@gmx.at wrote:
 Besides, even though linked data URIs and federated queries are nice, it
 is quite practical to have all relevant datasets accessible through a single
 SPARQL endpoint.

 :-)  I still find statements like this amusing, especially from the flagship
 organization representing the Semantic Web for healthcare and life sciences.

 I do understand *why* statements like this are made, and I do understand how
 much easier it is to demonstrate the Semantic Web if you remove Web from
 the equation... but it still irks me LOL!

I second that :)

Egon

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: [hcls] Updated wiki page for HCLS Knowledge Base

2009-10-13 Thread Egon Willighagen
On Tue, Oct 13, 2009 at 5:14 PM, Matthias Samwald samw...@gmx.at wrote:
 Back in 2006 in the Creeps paper, Ben and I wrote that the SWHCLS
 community has spent too much time focusing on Semantic rather than on
 Web... and it really hasn't changed much nearly 4 years later :-(  I
 honestly think that, if we all pulled in the same direction, we could make
 the Web aspect of the Semantic Web work better than it currently does...

 This is quite dissonant with the impressions that I have. The Linked Data
 paradigm is very popular, and a lot of work in this area (as well as query
 federation) is going on at the moment. What exactly are you missing in the
 work that is currently going on?

You wrote:
Besides, even though linked data URIs and federated queries are nice,
it is quite practical to have all relevant datasets accessible through
a single SPARQL endpoint.

Linked Data focuses on crawling the web. At least, that's the
impression I have... yet, a single store to query is indeed much more
convenient... it's sort of contradicting:

I can appreciate Mark's comment, as we have yet to come up with good
solutions for when to stop crawling and start analyzing the data...
the crawling is iterative, and each new analysis step may trigger new
queries (and probably should)... A single federated query is not what
I expect to be the final solution; instead, I expect an iterative
process, where possible steps may be federated, but iterative
nevertheless...

Having one single SPARQL end point indicates the crawling is done,
where we have only just started linking things together... that said,
I also don't think the final SPARQL end point should be remote at all,
and it steps over the current data licensing issues we still
(unfortunately) have to deal with...

And then I just think about remote services calculating things on the
fly, which will likely not be part of a SPARQL end point anyway...
Mark, or perhaps things like SADI should have a SPARQL interface? :)

Anyways... looking forward to meeting some/many of you in Amsterdam!

Egon

-- 
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Linking Open Drug Data wins First Prize of Triplification Challenge

2009-09-07 Thread Egon Willighagen
Hi Anja!

On Sun, Sep 6, 2009 at 12:05 AM, Anja Jentzscha...@anjeve.de wrote:
 the Linking Open Drug Data Task Force just won the first prize of the
 Linking Open Data Triplification Challenge [1] which took place at the
 I-Semantics in Graz. The paper we submitted can be found online [2] as well
 as the talk [3].

First of all, congrats with the win!

 Thanks to the LODD group for all the hard work and commitment to the
 project. It is a pleasure working with you!

I have a question about the title of the email... it is now entering
the social web as title too, and was wondering about the Open Data
nature of the data... the wiki page [5] does not provide any license
information or copyright statement, or any other claim about the users
rights to modify (extend, fix typos, ...) and redistribute, two rather
important aspects of Open Data [5]. In particular, I was not aware
that the DrugBank data actually was Open.

That said, I am not entirely sure how the LODD name of the task force
came about and whether it actually attempts to identify itself with
Open Data, or merely downloadable data.

Can you please elaborate on these issues?

Egon

4.http://esw.w3.org/topic/HCLSIG/LODD/Data
5.http://en.wikipedia.org/wiki/Open_Data

-- 
Post-doc @ Uppsala University
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



Re: Can RDFa be used on XML: pharma information

2009-06-23 Thread Egon Willighagen
On Tue, Jun 23, 2009 at 11:20 AM, Rick Jellifferjelli...@allette.com.au wrote:
 I am working on improving the semweb markup on an Australian government
 Department of Health and Aging website, which has HTML and XML versions of
 the medicines allowed for prescription and the amount the government pays.
 It has various links to interesting documents, and we want to make it more
 semweb friendly.

 Here are two example pages to give you the idea (they have different
 selections of data):

 http://www.pbs.gov.au/html/consumer/search/results?term=Zyprexa%20Zydispublication=GE

  http://www.pbs.gov.au/xml/consumer/search/results?term=Zyprexa%20Zydispublication=GE

 We are doing some general things like improving the microformats (DC and
 hproduct) in the HTML.

 But the plan was to decorate the XML (which has extra information)  with the
 appropriate RDFa, which seems perfect. But now I see that the RDFa spec says
 that RDFa is designed for use on XHTML. We do no want to use it that way, we
 want to augment the XML.

 So I was wondering if anyone here had any advice? I see the choices

Instead of the XML end point, I would express all that content as RDF
(possibly in the XML format). If you need the XML for the metadata
info on the request, you could consider putting a RDF element
somewhere in your custom XML.

Egon


-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Can RDFa be used on XML: pharma information

2009-06-23 Thread Egon Willighagen
On Tue, Jun 23, 2009 at 11:49 AM, Rick Jellifferjelli...@allette.com.au wrote:
 So there is still no convenient way to mark up existing XML as RDF?  It was
 a showstopper 10 years ago but I kind of expected there would have been some
 progresssigh

Define 'markup'... you can just embed your RDF in your XML, using
RDF/XML... the namespacing is the indication what is RDF and what is
not... no other 'markup' needed...

Can you elaborate on the inconveniences you talk about a bit more?
That makes providing solutions easier...

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Can RDFa be used on XML: pharma information

2009-06-23 Thread Egon Willighagen
On Tue, Jun 23, 2009 at 12:15 PM, Rick Jellifferjelli...@allette.com.au wrote:
 Markup = annotation.  Taking existing data and adding stuff to make it more
 useful, without disrupting existing uses of that data (and without creating
 the size/maintenance issues you get from duplication.)
 One of the rationales for this project is to make more effective use of
 bandwidth, which makes me lean against duplication somewhat, but it may
 indeed be the appropriate way.

OK, so the requirement is to: 1. stick with the current XML, 2. provide RDF/XML.

I think XSLT route proposed by others is the way to go then, making a
third end point, which would take the current XML as input, convert it
with XSLT to RDF/XML. Using RDF/XML has the advantage here that you
can validate your XSLT stylesheet for the output content too,
increasing your changes of detecting typos etc.

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Can RDFa be used on XML: pharma information

2009-06-23 Thread Egon Willighagen
On Tue, Jun 23, 2009 at 2:48 PM, Rick Jellifferjelli...@allette.com.au wrote:
 I see that the 2008 draft
  http://www.w3.org/2006/07/SWD/RDFa/rdfa-overview
 says
  RDFa itself is intended to be a technique that allows for adding metadata
 to any (XML) markup document, including SMIL, RSS, SVG, MathML, etc. Note,
 however, that in the current state, RDFa is being defined only for the
 (X)HTML family of languages.

 So I think I will go ahead and add some RDFa markup to the XML, so that
 there is some data on the web which might stimulate developers or inform
 them, and tell the client that we may need to change tack.

The problem here is to define what attributes your XML will use to
define the RDFa hooks... what attributes will define a new subject,
the predicate, and how you define the object...

Because the XML is using a local namespace, it will be unrecognizable
for any client... however, given you define those attributes (or via
new elements), you should be able to embed this RDFa in the HTML more
easily too...

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



drug side effects

2009-04-22 Thread Egon Willighagen
Just was told about this:

http://sideeffects.embl.de/

Rather permissively licensed.

SIDER contains information on marketed medicines and their recorded
adverse drug reactions. The information is extracted from public
documents and package inserts. The available information include side
effect frequency, drug and side effect classifications as well as
links to further information, for example drug–target relations.

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: HCLS Telcon

2009-04-02 Thread Egon Willighagen
Hi Scott (and others),

On Thu, Apr 2, 2009 at 12:42 AM, M. Scott Marshall
marsh...@science.uva.nl wrote:
 Here's the reminder for Thursday's HCLS call.

I will not be able to make it today.

And I think I found which telcon's are most interesting to me (LODD
and BioRDF), though several others have my interest too. Last two
weeks I have been working from home a bit more, with no option of
dialing in, so attending as much as possible from IRC. However, I
found it difficult to use this mechanism to report on the things I
have been up to...

So, I was wondering, what would be the best way to do this? Just send
an email to this list, and mention that:

- I converted NMRShiftDB.org into RDF, with NMR spectra for small,
drug-like molecules
- linked to that, Bio2RDF and to ChemSpider from rdf.openmolecules.net
- we are working on converting the StARLite DB into RDF, which holds
drug-assay-protein-proteinClass relations (though that will not get
online before summer)
- extending Bioclipse with RDF support using Jena, to allow
visualization of structural data, e.g. 3D protein structures, ligands
in 2D/3D

and just ask for feedback? (Thanx to those who already gave feedback
via other channels! And thanx to Kingsley for using the
rdf.openmolecules.net data in one of his applications.)

For the rest, I have the following on my todo list:

* write up my views on similarity of molecules in the RDF world
* write up what I think of SKOS with respect to OWL, and how we used
SKOS in MetWare for those reasons
* write up what my impresseion is of SWAN

It's been grant hunting season here, so been delayed with the above
things. Sorry about that.

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: blog: semantic dissonance in uniprot

2009-04-02 Thread Egon Willighagen
On Thu, Apr 2, 2009 at 5:35 PM, Michel_Dumontier
michel_dumont...@carleton.ca wrote:
 Actually, I'd say OWL is to blame here... that is, the OWL class was
 not properly defined.

 Just to clarify - it's not OWL that's the problem. It's the
 representation of Chemistry in a formal logic-based language where it
 actually matters what you say and how you say it.

Yeah, sorry, I knew I had to phrase that more correctly... it's not
the OWL standard, but whatever had been defined using OWL. These
things are pretty tricky, and if you read the IUPAC Gold Book on
definitions, it will not get much clearer either; there will be plenty
of use of owl:sameAs and all alternatives that define more loose
similarity to capture current terms...

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Announcement: Bio2RDF 0.3 released

2009-03-23 Thread Egon Willighagen
On Mon, Mar 23, 2009 at 12:09 AM, Peter Ansell ansell.pe...@gmail.com wrote:
 2009/3/22 Egon Willighagen egon.willigha...@gmail.com:
 On Sun, Mar 22, 2009 at 1:42 AM, Peter Ansell ansell.pe...@gmail.com wrote:
 Do you also provide InChIKey resolution?

 No. That requires look up, so only works against an existing database.
 Chemspider is doing this, but is not a general solution. InChIKey's
 are not unique, though clashes rare, and not observed so far.

 I didn't think it required a lookup to derive an InChIKey given an
 InChI.

Ah, sorry. InChIKey can be computed, but I thought you meant resolving
what structure has a given InChIKey... going from InChIKey to
structure does require lookup, generation from InChIKey from structure
(or InChI) does not.

 I realise that clashes are rare but possible, just wondering
 whether it would be supported. Leaving them out altogether just seems
 like missing possibly extra information.

I'll add them where missing.

 [1] It is just that InChI's
 can get pretty long for complex molecules and it makes it harder for
 people to accurately copy and paste them around when needed.

 Indeed. However, InChIKey is less precise. RDF allowing us to be do
 things in an exact manner, I rather use InChI.

 InChiKey's might be better for general use in RDF because they have a
 guaranteed identifier length and therefore won't become cumbersome for
 complex molecules.

 But can never be used for owl:sameAs like relations.

 Having them as properties could give someone a quick clue as to
 whether they are looking at the same molecule. Humans do interact with
 RDF (inevitably), and having short hash values can still be valuable.
 Given that hashes are usually designed to amplify small changes, it is
 easier than reading a 10 line InChiKey to determine whether there was
 a difference.

Agreed.

 Currently all of the InChI's that I have seen have been as Literals,
 but it would be relatively easy to also provide them as URI's to
 provide the link since you have a resolver for them set up.

 That was precisely the reason why I started the service.

 Good work.

Thanx for the feedback!

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Announcement: Bio2RDF 0.3 released

2009-03-23 Thread Egon Willighagen
Hi Kei,

On Mon, Mar 23, 2009 at 3:37 PM, Kei Cheung kei.che...@yale.edu wrote:
 As part of the biordf query federation task, we are currently exploring a
 federation scenario involving integration of neuroreceptor-related
 information. For example, IUPHAR provides information for different classes
 of receptors. For example, in the table shown at
  http://www.iuphar-db.org/GPCR/ReceptorListForward?class=class%20A, ligands
 are provided for receptors but not InChI codes ...

That's an interesting table... not Open it seems... did you ask
permission (and get) permission to  redistribute under a free license,
perhaps? The list is not overly long, and InChIs could be added
manually, though one would have to assume the compound names (btw,
some are compound classses!) are unique...

PubChem also has links to MeSH terms, and I also see a MeSH term in
the ChemBox on WikiPedia... that would be open data, and could provide
similary information.

I have been pondering about setting up open source semantic wiki to
linking data, where there is no Open source for that available, but
have not had time for that yet.

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: blog: semantic dissonance in uniprot

2009-03-21 Thread Egon Willighagen
On Sat, Mar 21, 2009 at 5:01 AM, eric neumann ekneum...@gmail.com wrote:
 There is no such thing as a referenceble instance of a specific instantiated
 molecule (that specific molecule); all gene, protein, and chemical records
 are about the category or group of exemplar molecules:
 SAME molecular structure, NOT SAME atoms (so we already aren't really things
 in the real world ;-) ); all molecular databases are based on this asserted
 fact.

Even worse. Since there are 10^20 molecules in most used materials,
many 'molecular' properties are really material properties. A melting
point is not a molecular property, but often even reported as
elemental property.

 Most users of molecular information aren't ignorant about the difference
 between a protein and a record of a protein; it's just that they don't want
 to deal with all the extra CS mechanics (that prevent getting their job
 done). And so an instance of a protein record in a database (or a reference
 to it from another database) is the closest thing to saying: here's the
 protein.

Chemists are not interested in single molecules (well, most are not,
but with increasing nanotechnology...). I was told recently that upper
ontologies have proper mechanisms to point out the difference between
(in Java terminology) objects and classes, or instances and concepts.

 Different records exist for the same protein, which indeed has been a
 historic point of complication; but this is really a social issue, not a
 semantic one, and the key data authorities have already for years
 coordinated on this point by supplying cross-references to each
 other.

There is another level to this: that of a measurement or observation,
and the identity we assign to it. The sequence of a protein, or
molecular structure of a drug of the model that people assigned to
some measurement. Measurements that point to the same measurable, may
actually be assigned different identities...

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Announcement: Bio2RDF 0.3 released

2009-03-20 Thread Egon Willighagen
Hi Peter,

On Fri, Mar 20, 2009 at 7:56 AM, Peter Ansell ansell.pe...@gmail.com wrote:
 * Some http://database.bio2rdf.org/database:identifier URI's are given
 by this, but these aren't standard, and are only shown where there is
 still at least one SPARQL endpoint available which uses them. People
 should utilise the http://bio2rdf.org/database:identifier versions
 when linking to Bio2RDF.

I'm using ChEBI IDs right now to link to your RDF with owl:sameAs:

http://rdf.openmolecules.net/?InChI=1/C12H8O2S/c13-8-5-6-10-11(12(8)14)7-3-1-2-4-9(7)15-10/h1-6,13-14H

Linking back to rdf.openmolecules.net can be done as shown above with the InChI.

I'll hook up to your DrugBank and DBPedia later today. Do you already
make links between ChEBI and DBPedia? I created links by converting
SMILES into InChIs:

http://chem-bla-ics.blogspot.com/2009/02/dbpedia-enters-rdfopenmoleculesnet.html

Comments most welcome!

Egon

-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: Introduction(s) to HCLS IG

2009-03-09 Thread Egon Willighagen
Hi all,

On Fri, Mar 6, 2009 at 8:27 PM, M. Scott Marshall
marsh...@science.uva.nl wrote:
 Several new people have joined HCLS IG http://www.w3.org/2001/sw/hcls/
 lately. Welcome! We have a tradition of sending an Introduction so the
 mailing list to help participants get to know each other and find common
 interests. Would those of you who haven't yet done so please send an
 introduction to the list?

my apologies for not having sent around an introduction on who I am
and what I do before, but here goes.

My name is Egon Willighagen and currently post-doc at Uppsala
University, working on applying cheminformatics in drug discovery. I
am one of the lead developers of the (open source) Chemistry
Development Kit, and got a PhD (2008) in representation of molecular
systems in light of data analysis, which involves distribution of data
too, which explains my long standing interest in semantic markup of
molecular data, such as Chemical Markup Language. My blog is a good
resource of what I have been doing in general,
http://chem-bla-ics.blogspot.com/, or otherwise my publications,
http://www.citeulike.org/user/egonw/tag/papers.

Currently I am working on setting up RDF for small molecules, with the
InChI as central identifier: http://rdf.openmolecules.net/

In addition to this I am extending Bioclipse with RDF support (see my
blog), which allows visualization of molecular data in 2D/3D and use
it for model building and pattern recognition.

In the past year I worked at Wageningen University, where I worked on
a SKOS-based ontology for metabolomics, for which the software under
development in an international consortium is available from
http://metware.org/.

Hoping this was informative,

Egon Willighagen


-- 
Post-doc @ Uppsala University
http://chem-bla-ics.blogspot.com/



Re: C-SHALS 2009

2009-01-23 Thread Egon Willighagen

On Fri, Jan 23, 2009 at 12:33 AM, Susie M Stephens
stephens_susi...@lilly.com wrote:
 I wanted to give everyone a heads up about the C-SHALS Conference [1].

Thanx for the pointer.

Is there an overview of conferences of interest for people on this
list, new-comers like me in particular? (Is there any common 'tag'
used in social bookmarking (etc) used by anyone?)

For myself, I am looking for something in Aug-Dec 2009...

Thanx,

Egon

-- 

http://chem-bla-ics.blogspot.com/



Re: Overview of conferences (like C-SHALS)

2009-01-23 Thread Egon Willighagen

On Fri, Jan 23, 2009 at 2:35 PM, Kei Cheung kei.che...@yale.edu wrote:
 You might also want to give del.cio.us (http://delicious.com/) a try.

Yes, agreed. I have been using this for quite some time now.

I'll start using the tags: hclsig, lodd and others, for things I find
relevant to this list.

With that in mind, and FriendFeed users around?
http://friendfeed.com/rooms/hclsig?

Egon

-- 

http://chem-bla-ics.blogspot.com/



Re: Pharma Ontology Telcon Minutes

2009-01-23 Thread Egon Willighagen

On Fri, Jan 23, 2009 at 5:42 PM, Susie M Stephens
stephens_susi...@lilly.com wrote:
 The minutes are now available from yesterday's Pharma Ontology call [1].
 Thanks very much to Christi for scribing.

The chatting between Elgar and Scott M at the end of the IRC
transcript would actually not of Elgar, but of Egon (me).

Egon

-- 

http://chem-bla-ics.blogspot.com/



Re: RDF for molecules, using InChI

2007-09-19 Thread Egon Willighagen

Hi all,

I have not applied all suggestions sent my people on the list, but
wanted to give a short update. So, no RDF document pointing to all
molecules with non-trivial RDF statements, and no RDF-based
definitions of the properties used. Apologies for that.

On 8/2/07, Egon Willighagen [EMAIL PROTECTED] wrote:
 I played a bit with RDF for molecular data a bit this week, and now
 have a RDF provider service (try methane [1]), which is written in
 PHP, uses XSLT to create a HTML frontend (*). It works for any
 molecule/InChI, but depends on 'plugins' to set up any other than the
 implied properties (i.e. reproduce the InChI).

I have added a new module that extracts 'tags' for molecules [1], and
am quite happy with the setup. It is using the rdf.openmolecules.net
URL identifier, which can be added to Connotea and tagged, like any
journal article or website. The website uses the Connotea API to
convert these tags into RDF properties for the InChIs.

In the blog item, I give some applications of the tagging, like
defined sets of molecules etc. SPARQL would be rather suited to
extract such sets from the RDF statements.

Again, comments most welcome.

Egon

PS. don't want to stir up the URL-versus-URI discussion. Tagging
molecules has just been on my wish list for some time, and this seems
to work well.

1.http://chem-bla-ics.blogspot.com/2007/09/tagging-molecules-mashup-of-connotea.html

-- 

http://chem-bla-ics.blogspot.com/



Re: RDF for molecules, using InChI

2007-08-21 Thread Egon Willighagen

Eric,

On 8/18/07, Egon Willighagen [EMAIL PROTECTED] wrote:
 On 8/17/07, Eric Neumann [EMAIL PROTECTED] wrote:
   Thanks for the pointer-- is there a list of all the molecules you store
  something about?

 Not at this moment. That would be a rather lengthy RDF doc. The number
 of molecules of which something is know is currently in the order of
 10M. I have not taken up the challenge of hosting that.

Actually, I could make a list of some 250 molecules [1]. Should I make
one RDF file listing all triples for all molecules, or make one master
file, which points to the current RDF 'files'?

Egon

1.http://cb.openmolecules.net/inchis.php

-- 

http://chem-bla-ics.blogspot.com/



Re: RDF for molecules, using InChI

2007-08-18 Thread Egon Willighagen

Hi Eric,

On 8/17/07, Eric Neumann [EMAIL PROTECTED] wrote:
  Thanks for the pointer-- is there a list of all the molecules you store
 something about?

Not at this moment. That would be a rather lengthy RDF doc. The number
of molecules of which something is know is currently in the order of
10M. I have not taken up the challenge of hosting that.

 I've noticed your server will handle any InchI string it
 receives, though CIDs and other annotations will not be returned.

Correct. I looks up information from other sources, so the 'service'
is more like a relay or aggregator than a database.

 Since one
 can determine the molecular weight from InchI, would it make sense to
 include such a feature?

Yes. One thing I am interesting in, is using SPARQL to find incorrect
data in databases. And it happens to occur that databases show one
InChI (e.g. of salts), but derive properties of only one fragment...
that will show up in the MW.

  Finally, I see you point to Pubchem CIDs and Pubchem uses InchIs as well,
 so is there any way to include all compounds PC refers to as well?

Yes, I do plan to write a relay for PubChem. I will make these scripts
open source asap, but been busy with project reports and grant
proposals the last two weeks.

Egon

-- 

http://chem-bla-ics.blogspot.com/



Re: RDF for molecules, using InChI

2007-08-17 Thread Egon Willighagen

Hi all,

On 8/2/07, Egon Willighagen [EMAIL PROTECTED] wrote:
 1.http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4

a quick update, rdf.openmolecules.net is now online, so the above URL becomes:

http://rdf.openmolecules.net/?InChI=1/CH4/h1H4

Egon


http://chem-bla-ics.blogspot.com/



RDF for molecules, using InChI

2007-08-02 Thread Egon Willighagen

Hi all,

I played a bit with RDF for molecular data a bit this week, and now
have a RDF provider service (try methane [1]), which is written in
PHP, uses XSLT to create a HTML frontend (*). It works for any
molecule/InChI, but depends on 'plugins' to set up any other than the
implied properties (i.e. reproduce the InChI). The methane example
mentioned shows some information extracted from Chemical blogspace
[2], but I plan to write other plugins too, e.g. for PubChem,
ChemSpider and other databases.

I have written up some thoughs at [3], and would much like to hear
your opinions and comments.

Looking forward to hearing from you,

kind regards,

Egon Willighagen
http://chem-bla-ics.blogspot.com/

*) FireFox 2.0.0.6 and IE pick up the declared stylesheet, but
Konqueror/Linux does not.

1.http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4
2.http://cb.openmolecules.net/
3.http://chem-bla-ics.blogspot.com/2007/07/rdf-ing-molecular-space.html