existing semantic models for normal conditions of metabolites in bodily fluids?
Hi all, I'm extracting some metabolite-disease relationships and the book I'm reading also lists normal concentrations metabolites in various bodily fluids for various age groups. For example, for Phe 0-1 years in serum it lists <80 micromolar ("newborns" is another age group, but most are like "x-y years"). Has anyone encoded such information in a semantic manner already? What should I be reading? Looking forward to hearing from you, grtz, Egon -- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: -0001-7542-0286 ImpactStory: https://impactstory.org/EgonWillighagen
Re: FDA: Semantic Web Technologies Fellowship
On Mon, Apr 8, 2013 at 6:50 PM, Michel Dumontier michel.dumont...@gmail.com wrote: Charlie, Eric, Kerstin, and others, keep up your good work to communicate the benefits of Semantic Web technologies in simplifying and improving the delivery of knowledge across the regulatory and pharmaceutical, clinical and health care jurisdictions. Indeed! Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Fwd: [Ops-ttf] Fwd: DrugBank not available?
Hi all, That sounds like a great idea, but only if the metadata itself is maintained, which is generally hard as these things fall into disrepair. For example the HCLS LODD entry for the DrugBank RDF [1] claims that it is updated regularly when in reality it has not been updated since 17 November 2008. [1] http://www.w3.org/wiki/HCLSIG/LODD/Data I got this on another mailing list... I think we really need to go through that list... I guess the Linked Life Data task force is the most likely group to do that? I will bring this up this Monday... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Can we please stop all those CfPs and use SemWeb technologies instead?
On Tue, May 29, 2012 at 4:42 PM, Bernadette Hyland bhyl...@3roundstones.com wrote: Perhaps use a mail filter rule that includes CfP or other subject tags so you aren't flooded. I for one find the calls useful and not much of any annoyance. If only that was consistently used... I find a list, such as Michel pointed out *very* useful, but I also find myself ignoring the list now, because it is mostly CfPs... :( I will give filtering a try, and hope that everyone will remember to add CfP to the subject! Michel, Bernadetta, thanx for replying! Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Can we please stop all those CfPs and use SemWeb technologies instead?
Hi all, my inbox is flooded with call for papers for the many SemWeb conferences, meetings, special issues, ... This is silly: can we please use SemWeb technologies for this instead, so that when I have something to submit, I can just query for upcoming meetings, issues, etc? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Predictive Toxicology with Cheminformatics and SemWeb (Fwd: Open postdoc position in Uppsala)
Hi all, the Bioclipse team is looking for someone with good semweb skills to be used in predictive toxicoloy, for a two year postdoc position. See the announcement below, or follow this link: http://www.uu.se/jobb/others/annonsvisning?tarContentId=186211languageId=1 Egon -- Forwarded message -- From: Ola Spjuth ola.spj...@farmbio.uu.se Date: Wed, Apr 18, 2012 at 6:33 PM Subject: [OTDev] Open postdoc position in Uppsala To: opentox development mailing list developm...@opentox.org Dear all, I would like to draw your attention to an open postdoc position in Uppsala which is very relevant to the goals of OpenTox, and would be happy if you could spread this with your colleagues. Kind regards, Ola Spjuth -- Postdoc position in cheminformatics, bioinformatics, or computer science - applications in predictive toxicology We have an open postdoc position in the group of Pharmaceutical Bioinformatics at the Department of Pharmaceutical Biosciences, Uppsala University, Sweden. The successful applicant will conduct research on data interoperability for predictive toxicology, and especially design and implement an infrastructure consisting of a database and user interfaces for data and predictive models in toxicology. Of particular interest will be to merge chemical and biological data within a semantic framework, and link toxicity data to genomics and metabolomics data (toxicogenomics) with a connection to the Bioclipse framework (www.bioclipse.net). PhD degree or equivalent scholarly competence in a relevant branch of chem/bioinformatics or computer science and a strong interest in informatics and data integration is required. Required competences include web programming, databases, and working knowledge in Java. Experience with linked data is desirable. Deadline for application: May 9th , 2012. Link to job ad and application form: http://www.uu.se/jobb/others/annonsvisning?tarContentId=186211languageId=1 ___ Development mailing list developm...@opentox.org http://www.opentox.org/mailman/listinfo/development -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Fwd: Nature Publishing Group Linked Data Platform
That they do not do these things yet, sounds like a there are a lot of opportunities... Egon Op 5 apr. 2012 17:41 schreef Michel Dumontier michel.dumont...@gmail.com het volgende: In case you haven't seen, Nature PG now has LOD and a SPARQL endpoint : http://www.nature.com/press_releases/linkeddata.html unfortunately, after a cursory look ( hope i'm wrong) - i don't think the data links into anything on the semantic web... (mesh terms are literals, pmids are in NPG's namespace with no links to identifiers.org, etc) m. Nature Publishing Group (NPG) today is pleased to join the linked data community by opening up access to its publication data via a linked data platform. NPG's Linked Data Platform is available at http://data.nature.com. The platform includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. In this first release, the datasets include basic citation information (title, author, publication date, etc) as well as NPG specific ontologies. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximal use/re-use of this data. NPG's platform allows for easy querying, exploration and extraction of data and relationships about articles, contributors, publications, and subjects. Users can run web-standard SPARQL Protocol and RDF Query Language (SPARQL) queries to obtain and manipulate data stored as RDF. The platform uses standard vocabularies such as Dublin Core, FOAF, PRISM, BIBO and OWL, and the data is integrated with existing public datasets including CrossRef and PubMed. More information about NPG's Linked Data Platform is available at http://developers.nature.com/docs. Sample queries can be found at http://data.nature.com/query. -- Michel Dumontier Associate Professor of Bioinformatics, Carleton University Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group http://dumontierlab.com
Re: LODD/BioRDF telcon NEXT WEEK (not today)
On Mon, Apr 2, 2012 at 2:55 PM, M. Scott Marshall mscottmarsh...@gmail.com wrote: Having the next LODD/BioRDF telcon next week (NOT TODAY). I will send an agenda for next week's call in the next few days. Next week Monday is easter? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: LODD/BioRDF telcon NEXT WEEK (not today)
On Mon, Apr 2, 2012 at 3:48 PM, M. Scott Marshall mscottmarsh...@gmail.com wrote: In the Netherlands it is a day off, yes. In Europe as well, I believe, but not in the U.S. I was hoping that we wouldn't lose too many people.. We can try for another day if that's the case. OK, I'll try to be there too. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: BioRDF/LODD Monday 11AM ET / 3PM GMT / 4PM CET
Scott, On Mon, Mar 19, 2012 at 12:19 AM, M. Scott Marshall mscottmarsh...@gmail.com wrote: Tomorrow, a teleconference to discuss efforts to employ RDF for expression studies (hoping for a short tour of code from a few contributors) and follow up for LODD emerging practices note. I have teaching at that time tomorrow. Sadly, I did not get around to proofreading the document I said I would try :( Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: [linkedlifew3cnote] Reminder: LODD telcon today Tuesday Nov. 22 at 11AM EDT (5PM CET)
Scott, On Tue, Nov 22, 2011 at 12:36 PM, M. Scott Marshall mscottmarsh...@gmail.com wrote: We are wrapping up a Google Doc version of the W3C note. Please call in to discuss it. We will be requesting comments from HCLS after this last iteration of edits. https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-0e0ixh21_U0y0/edit I went through the document during the call, and fixed a few typos... but I also ran into the Q10 section, where we suggest void:license. Two thoughts here, CC0 which we just mentioned just before is not a license, but a waiver; However, VoID specifically allows that combination. However, VoID does not use a license predicate in the VoID namespace, but reuses DCTerms for that dcterms:license, according to http://www.w3.org/TR/void/#license. I have updated the text for that. Grtz, Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: provenance questionnaire, v2
On Thu, Sep 1, 2011 at 11:42 PM, Deus, Helena helena.d...@deri.org wrote: For those of you who haven’t answered and would like to give your 2c about how provenance should be dealt with on the semantic web, here’s your chance! Authorization would probably not be considered provenance, but I was wondering if the WG has been talking about that, and if there is an existing ontology that would be suitable for that, compatible with the provenance ontology... it's clear that at least the depositors (provenance) have authorization, so compatibility at that level seems needed... Or? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: provenance questionnaire, v2
On Tue, Sep 6, 2011 at 11:18 AM, Deus, Helena helena.d...@deri.org wrote: I will forward you concerns to the provenance workgroup. Well, authorization is going to be a big thing in our EU project... various reasons for that, social, contractual, political. That's just the way it is. I can elaborate further on our needs, if the is useful to the WG. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Database versioning and maintenance
Hi Peter, On Wed, Aug 3, 2011 at 1:02 PM, Peter Ansell ansell.pe...@gmail.com wrote: On your question about Chembl in Bio2RDF, we currently directly use Egon's sparql endpoint to provide access to it, but we can easily switch, thanks to the way the server can be configured. If John Overington is publishing RDF, (preferably using a SPARQL endpoint and scripts so that others can regenerate the RDF if they need to based on the raw data), then we should be able to transparently switch Bio2RDF to using that dataset, barring unresolvable changes in the dataset structure and identifiers. I am working towards an updated version, based on ChEMBL 10. I will make this available via the same SPARQL server as now, but also via Kasabi for those who need a more reliable service (though the rdf.farmbio.uu.se uptime has been quite OK :). Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Version 1.0 of Bio2RDF and Chembl webapps released
On Thu, Jun 30, 2011 at 8:58 AM, Peter Ansell ansell.pe...@gmail.com wrote: The 1.0.1 version of the Bio2RDF server software has been released on Sourceforge. The software is designed to be a Linked Data interface to a range of RDF datasources, with the current examples being Bio2RDF and Chembl. (There was a small bug fix needed to enable endpoint round-robin between 1.0.0 and 1.0.1.) and [2] http://sourceforge.net/projects/bio2rdf/files/chembl-server/chembl-webapp-1.0.1/ Peter, is that app running online somewhere too? What's the link to that? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: LODD telcon on Wed at 11AM ET / 5PM CET
Hi all, I have not been able to find a way to sit behind a desk right now. I will have internet access, but an unstable one. I'll join via IRC, via which I will be available to give an update on ChEMBL-RDF. In short, it comes down to: 1.http://rdf.farmbio.uu.se/chembl/sparql has ChEMBL 09 2. but ChEMBL 10 has just been released, with a lot more data; I'll update soon 3. the new RDF has SMILES + InChI, and links out to http://rdf.openmolecules.net/ 4. the new RDF uses the CHEMINF ontology, see http://bioportal.bioontology.org/ontologies/1444 which I have collaborated on, with Janna Hastings (EBI, UK) as main developer, and also Michel Dumontier (I earlier already linked out to Bio2RDF) Egon On Mon, Jun 6, 2011 at 6:29 PM, M. Scott Marshall mscottmarsh...@gmail.com wrote: Brief update LODD datasets - Anja, Oktie, Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Good news from the EBI Semantic Web Industry Workshop and LODD members
On Wed, May 25, 2011 at 12:56 AM, M. Scott Marshall mscottmarsh...@gmail.com wrote: * Egon Willighagen is updating the downloadable RDF version of ChEMBL to version 9 This download is available from: https://github.com/egonw/chembl.rdf (click the Download button) What I have not done yet is to update the online SPARQL end point at: http://rdf.farmbio.uu.se/chembl/sparql/ That is still using ChEMBL 02 (I'll try to update that soon). Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Fwd: [open-science] LODD Hack Session Notes - Is It Open request signatories needed
Hi Matthias, On Wed, Mar 9, 2011 at 10:46 AM, Matthias Samwald samw...@gmx.at wrote: I'm not sure if clear-cut rules for LODD have been defined. However, many people interested/involved in LODD come from commercially oriented companies (mostly pharmaceutical companies). Therefore it certainly IS a reason for concern if 5 out of 12 datasets disallow commercial use without permission. Agreed. It may also be relevant to all those research institutes that also have commercial activities, many of them who have mixed funding from national and EU projects, but also sell consultancy, etc. It would certainly be helpful to convince these data providers of removing the NC clause, but it seems unlikely. Indeed. This is why that latter was supposed to be informative, rather than requesting dropping that clause. At this moment, I am not aware that anyone has challenged a company for using data with a NC clause, but this is bound to happen. Looking at the list of datasets with NC clauses (including Drugbank, LinkedCT, major parts of SIDER, STITCH), I get the feeling that the providers did not choose to include NC clauses on a whim. Agreed. I guess the best we can realistically do for these datasets is to improve the visibility of these licensing restrictions for people that want to use them. Yes, and that's an actual LODD activity we discussed about half a year ago, and which was the first half of the work done in the hack session: just getting clear what the actual terms of use are :) For three they are unclear, and we will seek clarification for those. That's the three letters being referred to in Jenny's email. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Fwd: [open-science] LODD Hack Session Notes - Is It Open request signatories needed
Hej Amrapali, On Wed, Mar 9, 2011 at 2:31 PM, Amrapali J Zaveri amrapali.j.zav...@gmail.com wrote: To answer the question about the licensing/copyright issue, according to WHO, if extracts from WHO website of publication are used for research, private study or in a noncommercial document with limited circulation (such as an academic thesis or dissertation), then it is allowed to do so without seeking permission [3]. WHO encourages the use of its information materials for information purposes i.e. when the purpose of the use is to share objective information, whether free of charge or for sale. Only if the material is to be used for commercial purposes, it requires a license. First of all, thanx for sending these pointers! If I read this, however, I am tempted to think that this data is really not Open. Making the whole available as RDF does not sounds to me as 'extracts' or 'limited circulation' to me... However it does have a copyright notice: Copyright World Health Organization (WHO), 2011. All Rights Reserved [4], so maybe this copyright could be added. Yes. Also, the dataset is available at this SPARQL endpoint: http://db0.aksw.org:8895/sparql and can be downloaded from here: http://aksw.org/Projects/Stats2RDF#h13390-5 . Hope that is sufficient information regarding the open-ness of the dataset. Let me know if any other information is required and suggestions are welcome :) Have you asked permission to 'circulate' the whole of the database? I think Jenny and I will have to update the letter with respect to these new details, and I'm really happy you pointed me to [3]. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Fwd: [open-science] LODD Hack Session Notes - Is It Open request signatories needed
Hi LODD wg members, Jenny Molloy sent around the results of the hack session with the science working group of the Open Knowledge Foundation, looking at the 12 data sets listed in the best practices paper. Of these two were clearly Open (ChEMBL, TCM-GenEdit), and one clearly not Open (UMLS). Diseasome seems to come from the OMIM data which is also not Open. Five databases have a non-commercial clause involved, making it Open according to the LODD definitions (correct?), but not Open following the OFKN's standards. The original plan was to set up an informative package of information explaining why the NC clause causes problems, but we did not get around to this. From a LODD perspective, this is a non-issue, as I understood (I have not been around when LODD was defined). SIDER and STICH have CCZero components and parts covered by NC, but the SPARQL end point is unclear in what parts it makes available. That leaves three datasets where we have not been able to find a clear licensing/copyright/waiver statement, and for these three letters are now written, see Jenny's email, to inquire under what conditions those data sets can be redistributed, which the LODD wg is already doing. This involves DailyMed, RXNorm, and the WHO-GHO data sets. Input from those who composed is helpful here. One thing that we want to get clear is if people can pull the data from the SPARQL end point, use/modify it, and even redistribute it. All in all, I think the (2 hour in the end) hack session was productive, and the licensing information has been updated in CKAN, where we also identified some wishes for improvements for CKAN, but that will be brought up on the CKAN mailing list. Thanx to all who where there and helped iron out licensing unclarities and helped with the letter. The final letters are linked to below, and if you helped on Monday then please do sign! Egon -- Forwarded message -- From: Jenny Molloy jcmcoppic...@gmail.com Date: Wed, Mar 9, 2011 at 1:18 AM Subject: [open-science] LODD Hack Session Notes - Is It Open request signatories needed To: open-scie...@lists.okfn.org Dear All We had a very productive hack session on Monday night regarding linked open drug data. You can see the full notes here: http://okfnpad.org/sciencewg-loddhack-201103 In summary, we reviewed the openness of several LODD data sets in CKAN and identified those whose maintainer's should be sent an Is It Open Data? request. We drafted letters to send to the World Health Organisation Global Health Observatory and the maintainers of two datasets at the US National Library of Medicine: http://okfnpad.org/sciencewg-who-letter http://okfnpad.org/sciencewg-rxnorm-letter http://okfnpad.org/sciencewg-nlm-letter Before we send them via http://www.isitopendata.org/, it would be great to get more signatories from the group, so please add your name to the end of the generic letter on http://okfnpad.org/sciencewg-loddhack-201103 if you are happy to be included. Unfortunately, we didn't remind all of the hack session participants to do this before they left, so if you helped on Monday then please do sign! We will be sending the letters on Monday 14th March during a follow up session, of which more details are to follow. If there is a group on CKAN, or a general topic area that you feel would be a good target for future sessions of this nature, then please let me know! Jenny ___ open-science mailing list open-scie...@lists.okfn.org http://lists.okfn.org/mailman/listinfo/open-science -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Work/Hack session on LODD / IsItOpenData
On Sun, Feb 20, 2011 at 7:06 PM, Egon Willighagen egon.willigha...@gmail.com wrote: Please let me know if interested in joining the meeting, Thanx to the many respondents! I have set up a preliminary etherpad with details: http://okfnpad.org/sciencewg-loddhack-201103 Please take notice of those... the meeting is just on hour, so the focus, I hope, will be on writing 'Is it Open Data?' letters, rather than begin an informative/planning meeting. It's a true hack session. The letter may serve as example to other projects. If you like to join this hack session, you can just add your name and Skype account to the etherpad. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Reminder: LODD telcon today
Hej Scott, On Wed, Feb 23, 2011 at 4:51 PM, M. Scott Marshall mscottmarsh...@gmail.com wrote: Agenda Best practices document for mapping life sciences data Data updates I like to bring up the meeting scheduled in March together with the OKF on looking at the Open Data nature of LODD data sets (and quite possible others too). I have set up a group at CKAN: http://ckan.net/group/hclsig_lodd which currently lists the 12 sets from the Best practices paper. I can't make the call, but will hang out on IRC and in the GDoc... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Work/Hack session on LODD / IsItOpenData
Hi all, on February 16 there was an OpenScience Workgroup [0] meeting of the Open Knowledge Foundation, where we scheduled a work session for Monday March 7th at 19:00 GMT, for about an hour, to work on clarifying the data licensing of LODD data sets, such as those listed on CKAN. For some data sets the license is clear, others have non-commercial clauses (which is not considered Open Data), and others do not specify the terms, or are not open at all. For example, the below diagram a CAS node which is not open data at all: http://www.w3.org/wiki/HCLSIG/LODD/Data In fact, the unclear license issue was one of the referee comments on the LODD contribution to the thematic issue on RDF in chemistry (coordinated by Matthias). Please let me know if interested in joining the meeting, Egon 0.http://science.okfn.org/ -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
CC0 RDF hosting service
Hi all, Mark Hahnel has extended his Science 3.0 website with a CC0 RDF hosting service. The purpose is small data sets, like results from experiments. http://www.science3point0.com/opendata/ It is still new, and most certainly not in its final form. Yet, this should be of such interested to the Open Science community I had to forward the new here. The full announcement can be found at: http://www.science3point0.com/blog/2010/12/29/cc0-rdf-hosting-for-scientists/ With kind regards, Egon -- Dr E.L. Willighagen Postdoctoral Research Associate University of Cambridge Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: LODD Telcon
On Wed, Nov 10, 2010 at 4:45 PM, Matthias Samwald samw...@gmx.at wrote: (keeping in mind that we have a limit of only 1500 words for the main section). I was informed that we should consider this a guide line and are free to use more words if needed. Egon -- Dr E.L. Willighagen Postdoctoral Research Associate University of Cambridge Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: LODD Telcon
On Mon, Oct 25, 2010 at 3:32 PM, Susie Stephens susie.steph...@gmail.com wrote: Here's the reminder for Wednesday's LODD telcon. I'll have to cancel too... birthday... family obligations (forgot it... well, ignored it, I guess) Data update: * no new RDF from my side * other drug-related data: QM-calculated 3D structures: http://quixote.wikispot.org/, they will output RDF in their workflow JChemInf Paper: * Preliminary Communication is OK: details - http://www.jcheminf.com/info/instructions/?txt_jou_id=10170txt_mst_id=121837 * I like to see a federated SPARQL query for drug data... but haven't managed to sit down for that yet :( Apologies for missing this important call! Egon -- Dr E.L. Willighagen Postdoctoral Research Associate University of Cambridge Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: use of RDF in chemistry
On Thu, Sep 9, 2010 at 6:27 PM, Susie Stephens susie.steph...@gmail.com wrote: Chem2bio2rdf might have some data that they could share with you. Dave Wild is the PI for the project. And he was one of the speakers... Egon -- Dr E.L. Willighagen Post-doc @ Uppsala University (only until 2010-09-30) Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
use of RDF in chemistry
Hi all, two week ago I organized with Martin Braendle (ETH/Zurich) a 1.5 day symposium on the use of RDF in chemistry at the American Chemical Society meeting in Boston, and was very happy that Eric Prud'hommeaux was there to (re)present HCLS / LODD. Several slide sets have are now available online [0]. With kind regards, Egon Willighagen 0.http://egonw.github.com/acsrdf2010/ -- Dr E.L. Willighagen Post-doc @ Uppsala University (only until 2010-09-30) Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
ESW wiki: Semantic Web extensions?
Hi all, the ESW wiki is running MediaWiki, but not Semantic MW, right? Could it be an option to actually get that installed and, umm... eat our own dog food? Samuel (my former student) working on RDFIO during his Google Summer of Code project, and it should even be possible to get a SPARQL end point on the wiki that way: http://saml.rilspace.org/rdfio-040-released-gsoc-finished The context was a recent question about HCLS WG participation: * how many paricipants are there, and from how many different organizations? * how many of those organizations are academic and how many industrial? Would be cool to just pull that out from the ESW wiki pages, not? Egon -- Dr E.L. Willighagen Post-doc @ Uppsala University (only until 2010-09-30) Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Program for the RDF symposium at the American Chemical Society fall meeting
Dear all, it is my great pleasure to present the full symposium program for the RDF session at the American Chemical Society at the Boston meeting in August: http://egonw.github.com/acsrdf2010/ I am excited that on Monday afternoon Eric Prud'hommeaux will present the work of the LODD working group to the chemistry community. The symposium contains three half day sessions with topics on computing, ontologies, and applications, all chemistry oriented. The goal of the meeting is to get together people using RDF technologies in chemistry, and the list of talks from around the world shows that this goal has been reached. The program is diverse and exciting, and I am very much looking forward to meeting all participants to discuss challenges and cool solutions. People interested in joining, can sign up to the meeting mailing list, linked to on the homepage. Besides that the webpage is in XHTML+RDFa, the source is also available on GitHub (well, you really download the source code anyway), allowing people to happily fork, make changes, and perhaps make the page as triple-dense as is possible. Hoping to have informed you well, with kind regards from Uppsala, Egon Willighagen -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: esw wiki changed to media wiki
On Tue, Apr 6, 2010 at 11:04 PM, Eric Prud'hommeaux e...@w3.org wrote: The long-awaited ESW wiki change from MoinMoin to Media Wiki has finally occurred. Media Wiki is the wiki upon which Wikipedia is based. It has tons of cool modules and is well-maintained. This change affects authentication and spam control. It is not a Semantic MediaWiki [0], or is it? Egon http://semantic-mediawiki.org/wiki/Semantic_MediaWiki -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Last Call for Papers: RDF ACS conference, Boston 2010
Dear all, here a quick reminder that in four days, the 28th, is the deadline of the abstract submission of the ACS RDF conference, as part of the chemical information (CINF) section of the American Chemical Society. With this email I like to provide some further information, including the scope of the meeting, the 2nd Call for Papers (at the end), and the structure of a typical abstract. The ACS meetings typically have more than 10.000 participants, though the number of people attending the CINF symposia during those meetings is around 100. CINF meetings typically are about chemical information in general, though drug discovery applications take a prominent place. Other CINF symposia in Boston include: * Streamlining systems biology and cheminformatics approaches with high-throughput screening in lead discovery * The Emerging Concepts of Activity Landscapes and Activity Cliffs and Their Role in Drug Research * Leveraging Modeling and Informatics for Rare and Neglected Diseases * Where's the good stuff? Consumer health information social networking, resources and services * Biologics and Biosimilars: One in the Same? * Data-intensive drug design Symposia of the ACS Spring meeting ongoing right now in San Franciso, include: * Green Chemistry: Multidisciplinary use of chemical information resources * Data visualization * Libraries and large scale digitization initiatives (LSDIs) * Metabolomics * The Future of Scientific Publishing As should be clear, the scope is very broad. I'd say the many HCLS activities are relevant. The meeting is oriented at getting the various groups involved in RDF applications in chemical and molecular sciences together, perhaps as suggested best practices. Abstracts can consist of a single A4 paper with title, authors and affiliation, and perhaps a few references to relevant literature. Further information about the full meeting can be found at: http://portal.acs.org/portal/acs/corg/content?_nfpb=true_pageLabel=PP_ARTICLEMAINnode_id=2060content_id=CNBP_023925use_sec=truesec_url_var=region1__uuid=70ff72d1-db73-471b-8855-f3b5f1c4fba3 Looking forward to your abstract submissions, with kind regards, Egon Willighagen -- 2nd Call for Papers: Semantic Chemistry with the Resource Description Framework (CINF Symposium, ACS Autumn 2010) 240th ACS National Meeting Exposition We now invite papers for our symposium on the use of the Resource Description Framework (RDF) technologies in semantic knowledge representation and data exchange in chemistry at the 240th National Meeting Exposition of the American Chemical Society (ACS) in Boston this fall. Semantic Chemistry has been around for a while, but is seeing a revival with the adoption of the Resource Description Framework (RDF) and matching technologies in chemistry. RDF triples provide a simple structure that allow data and knowledge alike to be presented in a single framework. Derived technologies include the capturing of ontologies with the Web Ontology Language (OWL) and performing queries with SPARQL. A wide variety of free and open source product make it easy to set up servers with large amounts of RDF data, while integration with HTML is available too with RDFa. The RDF symposium at the 240th ACS national meeting in Boston invites submissions of talks about the use of RDF in chemistry and cheminformatics. Topics could include the use of OWL ontologies, OWL axioms, reasoning and interference, RDF in user interfaces, such as RDFa in web front ends, visualization, querying systems, and applications thereof, such as linking data sets, compound classification, cloud computing, web services, data aggregation, semantic publishing, and literature mining. Abstracts may be submitted via http://abstracts.acs.org/ now. You’ll find the RDF session as part of the CINF division symposiums. The submission deadline is March 28, 2010. In case of questions, please email Egon Willighagen at egon.willighagen[A]farmbio.uu.se or Martin Braendle at braendle[A]chem.ethz.ch. -- -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
2nd Call for Papers: Semantic Chemistry with the Resource Description Framework (CINF Symposium, ACS Autumn 2010) 240th ACS National Meeting Exposition
2nd Call for Papers: Semantic Chemistry with the Resource Description Framework (CINF Symposium, ACS Autumn 2010) 240th ACS National Meeting Exposition We now invite papers for our symposium on the use of the Resource Description Framework (RDF) technologies in semantic knowledge representation and data exchange in chemistry at the 240th National Meeting Exposition of the American Chemical Society (ACS) in Boston this fall. Semantic Chemistry has been around for a while, but is seeing a revival with the adoption of the Resource Description Framework (RDF) and matching technologies in chemistry. RDF triples provide a simple structure that allow data and knowledge alike to be presented in a single framework. Derived technologies include the capturing of ontologies with the Web Ontology Language (OWL) and performing queries with SPARQL. A wide variety of free and open source product make it easy to set up servers with large amounts of RDF data, while integration with HTML is available too with RDFa. The RDF symposium at the 240th ACS national meeting in Boston invites submissions of talks about the use of RDF in chemistry and cheminformatics. Topics could include the use of OWL ontologies, OWL axioms, reasoning and interference, RDF in user interfaces, such as RDFa in web front ends, visualization, querying systems, and applications thereof, such as linking data sets, compound classification, cloud computing, web services, data aggregation, semantic publishing, and literature mining. Abstracts may be submitted via http://abstracts.acs.org/ now. You’ll find the RDF session as part of the CINF division symposiums. The submission deadline is March 28, 2010. In case of questions, please email Egon Willighagen at egon.willighagen[A]farmbio.uu.se or Martin Braendle at braendle[A]chem.ethz.ch. -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Fwd: [open-science] Launch of the Panton Principles for Open Data in Science + Is It Open Data?
Hi LODD friends, I think the below post is of interest to our recent discussion... I specially like to point to the service listed down the bottom of the post: http://www.isitopendata.org/ Egon -- Forwarded message -- From: Jonathan Gray jonathan.g...@okfn.org Date: Fri, Feb 19, 2010 at 11:59 AM Subject: [open-science] Launch of the Panton Principles for Open Data in Science + Is It Open Data? To: open-science open-scie...@lists.okfn.org Hi all, We're pleased to announce the Panton Principles for Open Data in Science: http://blog.okfn.org/2010/02/19/launch-of-the-panton-principles-for-open-data-in-science/ http://www.pantonprinciples.org/ The Panton Principles were authored by Peter Murray-Rust, Cameron Neylon, Rufus Pollock and John Wilbanks at the Panton Arms on Panton Street in Cambridge, UK - with input from the Working Group on Open Data in Science. You can endorse the principles at: http://www.pantonprinciples.org/endorse We'd be most grateful for any help disseminating the principles - by blogging, microblogging, forwarding to relevant colleagues and so forth. The 'Is It Open Data?' service, which allows anyone to make and publicly record enquiries about the openness of (scientific) datasets, is also now live at: http://www.isitopendata.org/ All the best, -- Jonathan Gray Community Coordinator The Open Knowledge Foundation http://blog.okfn.org http://twitter.com/jwyg http://identi.ca/jwyg ___ open-science mailing list open-scie...@lists.okfn.org http://lists.okfn.org/mailman/listinfo/open-science -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
LODD: new ChEMBL SPARQL end point, requirements for embedding in LODD?
Hi all, as indicated on the LODD call, I had some trouble getting the ChEMBL SPARQL end point somewhat faster. The end point has move to a new more powerful Virtuoso server, also with more sane server settings: http://chem-bla-ics.blogspot.com/2010/02/chembl-rdf-1sparql-end-point.html The RDF graph links out to Bio2RDF, and the obvious next step is link to in the LODD network... However, I do not really seem to find requirements... I assume the regular LD [0] rules apply? The ChEMBL data is only available as SPARQL right now, but it a start... (the rest is a bit of PHP wrapping. BTW, anyone aware of a simple PHP lib for that? It's not much code, but I rather reuse anyway.)... Second thing is how to provide links... the InChI is mentioned (mis-capatalized at [1], which makes me wonder who I should ask for permission to make (minor) changes to the wiki), but many DBs do not have URIs which include the InChI, making it rather difficult to link to that resource in a general, independent way... that is, there is no [2] matching http://www4.wiwiss.fu-berlin.de/drugbank/snorql/?describe=http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00010. More general, is there a requirement and/or policy on how the resources should be linked up? Looking forward to hearing from you, Egon 0.http://www.w3.org/DesignIssues/LinkedData.html 1.http://esw.w3.org/topic/HCLSIG/LODD/Data/ 2.http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/InChI=1/C149H246N44O42S/c1-20-77(13)116(191-122(211)81(17)168-132(221)104(66-113(204)205)178-121(210)79(15)167-123(212)88(152)62-84-39-43-86(198)44-40-84)145(234)185-102(63-83-32-23-22-24-33-83)138(227)193-118(82(18)197)146(235)186-103(65-111(155)202)137(226)189-108(71-196)142(231)182-101(64-85-41-45-87(199)46-42-85)136(225)175-93(38-31-56-165-149(161)162)126(215)174-91(35-26-28-53-151)131(220)190-115(76(11)12)143(232)184-97(58-72(3)4)124(213)166-68-112(203)170-94(47-49-109(153)200)128(217)180-100(61-75(9)10)135(224)188-106(69-194)140(229)169-80(16)120(209)172-92(37-30-55-164-148(159)160)125(214)173-90(34-25-27-52-150)127(216)179-99(60-74(7)8)134(223)181-98(59-73(5)6)133(222)176-95(48-50-110(154)201)129(218)183-105(67-114(206)207)139(228)192-117(78(14)21-2)144(233)177-96(51-57-236-19)130(219)187-107(70-195)141(230)171-89(119(156)208)36-29-54-163-147(157)158/h22-24,32-33,39-46,72-82,88-108,115-118,194-199H,20-21,25-31,34-38,47-71,150-152H2,1-19H3,(H2,153,200)(H2,154,201)(H2,155,202)(H2,156,208)(H,166,213)(H,167,212)(H,168,221)(H,169,229)(H,170,203)(H,171,230)(H,172,209)(H,173,214)(H,174,215)(H,175,225)(H,176,222)(H,177,233)(H,178,210)(H,179,216)(H,180,217)(H,181,223)(H,182,231)(H,183,218)(H,184,232)(H,185,234)(H,186,235)(H,187,219)(H,188,224)(H,189,226)(H,190,220)(H,191,211)(H,192,228)(H,193,227)(H,204,205)(H,206,207)(H4,157,158,163)(H4,159,160,164)(H4,161,162,165)/f/h157,159,161,163-193,204,206H,153-156,158,160,162H2 -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Replies from Bio2RDF about contact with upstream data providers
Hi all, happy new year! On Mon, Jan 4, 2010 at 3:44 PM, Susie Stephens susie.steph...@gmail.com wrote: == Agenda == * Open data follow up - all * Data update - Anja, Jun, Matthias, Egon I have had email with Peter Ansell of the Bio2RDF project and copy/pasted replies below. *** Are you in contact with upstream providers? E.g. are they aware you rdf-ied their data? *** Only in cases where they do not offer licenses that we can use without telling them as far as I know. Some, like the full NLM pubmed license require that we ask, so in those cases they know. *** How do you propagate licenses and copyright? I know you have they data blobs nicely separated, so no problems with license incompatibility, but I did not see copyright/license statements mentioned on the RDF pages (or HTML conversion), nor in the list at [0]. Will copyright/license information be added to that list at [0]? 0. http://sourceforge.net/apps/mediawiki/bio2rdf/index.php?title=Namespace *** Quite a few of the pages, but not all, have a triple added that indicates where the license is to be found. We use the http://creativecommons.org/ns#license predicate to indicate the license, even if the license is not a CC license. It fits better IMO than dc:license and definitely better than xhtml:license. See http://bio2rdf.org/go:345 for an example, with the license redirect URL http://bio2rdf.org/license/go:345 redirecting in this case to http://www.geneontology.org/GO.cite.shtml. We do a redirect to the license because that is the easiest method, not that we couldn't do it directly. I prefer to have the ability to redirect licenses based on both the namespace and the identifier, particularly in the case of SIDER for example, where there are two datasets with different licenses in the same namespace because that is how it works. The current list that is used to autogenerate the license triples, although it should definitely be expanded, can be found in RDF at http://bio2rdf.svn.sourceforge.net/viewvc/bio2rdf/trunk/src/war/WEB-INF/base-bio2rdf-providers-licenses-config.n3?view=markup All of the providers there, insert the static RDF/XML that is defined at http://qut.bio2rdf.org/query:license, but another query could be used if there were specific conditions for particular datasets, as there will be with pubmed soon. *** Do upstream providers have preferences regarding how you put in the license? *** Not that I know of in most cases. The 2009 Pubmed License has a few new provisions though, so there are some cases that have different providers. *** Have you talked with upstream providers about changing licenses to reduce license conflicts? *** I can understand providers not wanting you providing their actual information in RDF, but I can't understand them thinking that they can have control over how people relate their personal datasets to their information in small amounts. If the linking is major then we could be in the situation that CAS tried to get into with WIkipedia, with CAS giving Wikipedia a special agreement. http://www.cas.org/newsevents/caswikipedia.html What they don't realise is that WIkipedia releases the information under the same license so it is totally free from that point on, and CAS cannot go back on the agreement if anyone can prove that they helped with the CAS number insertions on WIkipedia. *** Do all upstream databases provide open/free licensing? *** I only found three databases that we are currently offering for download that I will have to check up with Marc-Alexandre about the license conditions [...]. The majority of the databases seem to have the equivalent of CC-BY-NC on it, although they don't actually use Creative Commons licenses. -- Egon -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: LODD Telcon
On Wed, Jan 6, 2010 at 5:59 PM, Susie Stephens susie.steph...@gmail.com wrote: Minutes from the LODD telcon are now available. http://esw.w3.org/topic/HCLSIG/LODD/Meetings/2010-01-06_Conference_Call I heard about the conflict of universities trying to protect commercial interests *and* use CC-NC material on twitter, and asked for the link, which is: http://brains.parslow.net/node/1581 So, the details seem to be in the fact that some universities want to be qualified as 'commercial'... On the call there was talk that for-research-only it should be OK, but I do not believe redistributing still qualifies as for-research-only... Egon -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: LODD Minutes
On Wed, Nov 25, 2009 at 5:56 PM, Susie Stephens susie.steph...@gmail.com wrote: Minutes from today's LODD call are now available. http://esw.w3.org/topic/HCLSIG/LODD/Meetings/2009-11-25_Conference_Call Did my email of yesterday not reach the list? Egon -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: LODD Telcon
Hi all, next Wednesday I unfortunately cannot participate because of family obligations. On Mon, Nov 23, 2009 at 5:19 PM, Susie Stephens susie.steph...@gmail.com wrote: Here's the reminder for Wednesday's LODD telcon. I was up for a data update, so will have to do like this... my introduction to this list is ancient, so before. My background is cheminformatics and chemometrics (statistics/data analysis on chemical data). I'm a strong believer in Open Data, Open Source and Open Standards, and (past) developer of several projects, including Strigi-chemical (chemistry extension for the KDE desktop search engine), the Chemistry Development Kit, JChemPaint, Jmol, Jmol, and several other ones. Right now, I am postdoc in a drug discovery group at Uppsala University (Prof. Wikberg) and developing the cheminformatics use at the department, which includes the Bioclipse workbench. Proteochemometrics is the main statistical method used in our group, and model validation is clearly important. This is where RDF comes in: aggregation of data before model building, and for model validation afterwards. The latter will preferably be data which is related to the model, and not really of the same type. RDF is clearly one of the few methods up to this job. When I first joined the HCLS mailing list and conf calls, I saw very much focus on biological data, clinical data, but a lack of focus on the molecular chemistry behind all, which is actually crucial for the cheminformatics and proteochemometrics. So, that more or less defines the area where I contribute to the RDF activities... the border of molecular data and drug-related properties. So far, I have developed an extension for Bioclipse to deal with RDF, and it currently supports an in memory triple store, SPARQL queries on the in memory stores as well as on remote SPARQL end points. Like the most of Bioclipse2, it is scriptable, which allows easy building of small programs or workflows to integrate RDF into other Bioclipse extension, including the cheminformatics functionality, but also Jmol. There is also an R interface, to bridge with statistical modeling. Last week Friday, I gave a talk about this work at SWAT4LS in Amsterdam, and my slides are available in my blog [0]. Getting back to the data, I am working on making various unique molecular property resources available as RDF. This includes the GNU FDL-licensed NMRShiftDB data, which contains NMR spectra (mostly carbon-13) used for metabolite identification (think finding biomarkers). There are also two smaller CC0 data sets, one based on ChemPedia [1], a new crowd-sourcing endeavor for naming molecules (no i18n support yet, but requested), and the RDF Open Notebook Science Solubility project [2], which we described in a Chapter in the recent Beautiful Data book from O'Reilly. There are other things I am doing, which include an ontology for molecular (or QSAR) descriptors, and a RDF equivalent for the cheminformatics data model used by the CDK. This would, though I am myself not convinced this is really where we want to go, allow serialization of full molecular structures as RDF data, though parts of this may very well be rather useful for XHTML+RDFa for scientific publication of, for example, organic synthesis papers... I'd very much like to help get these data sets into the LODD network (particular the last two, which are easiest because of the CC0 license). One thing I want to do soon (actually, as part of the SWAT4LS proceedings paper), is create a data set with CDK-based molecular similarities. The CDK can calculate various, and this will create a nice sparse matrix. I'm leaning towards doing the molecules in DBPedia, but and more than Open to analyse other Open data sets too (bearing a proper license, or proper Public Domain statement, like CC0). I'll put up the final script on MyExperiment.org anyway, for others to analyze other data sets. No ETA for that, though. An example script downloads molecules from DBPedia and visualizes them 2D in a molecule table [3,4]. I am looking forward to hearing your comments and ideas on this work. Regards, Egon 0.http://chem-bla-ics.blogspot.com/2009/11/swat4ls-linking-open-drug-data-to.html 1.http://chem-bla-ics.blogspot.com/2009/11/chempedia-rdf-1-sparql-end-point.html 2.http://chem-bla-ics.blogspot.com/2009/11/open-notebook-science-solubility-sparql.html 3.http://egonw.posterous.com/molecules-in-dbpedia-visualized-with-bioclips 4.http://www.myexperiment.org/workflows/927 -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: [hcls] Updated wiki page for HCLS Knowledge Base
On Wed, Oct 14, 2009 at 11:30 AM, Matthias Samwald samw...@gmx.at wrote: that said, I also don't think the final SPARQL end point should be remote at all, So where should the final SPARQL end point be located? In a server inside the intranet of each organization? On the client side? How should it be filled? By crawling linked data resources? Please specify. The current scientific practice is to set up your input data first, and then do analysis... I have yet to see any scientist to differently. Projecting this to RDF, the input would be a single SPARQL end point. But since the scientist does want to aggregate and preprocess the data to his particular wishes and needs, *this* SPARQL end point will be local, so, yes on the client side. *How* the scientist will fill this local repository highly depends on his wishes too. This will likely be a mix of remote SPARQL queries, RDFa for extracting data from this new journal paper in Nature (...), some local RDF files (and perhaps a institutional SPARQL, though those resources seem to be rather unused so far, perhaps because they do not have SPARQL end points yet), some properties calculated locally and/or remotely which he needs too, etc. So, yes, by crawling the cloud for data. Point is: crawling will and must be a central part of the process. And as such, both Linked Data spread around the web *and* SPARQL end points will go hand in hand. But I disagree that SPARQL end points what we should aim at as data providers, as scientists will never use it as such anyway. Just think of it like this: if you aggregated the data already in the way the scientists wants it, he is no longer doing cutting edge science (it's already been done!). Yes, analysis goes beyond the aggregation, but to provide your scientific point, you will provide counter arguments based on *external* data, hence the crawling... Egon -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: [hcls] Updated wiki page for HCLS Knowledge Base
On Tue, Oct 13, 2009 at 4:13 PM, Mark ma...@illuminae.com wrote: On Tue, 13 Oct 2009 03:34:01 -0700, Matthias Samwald samw...@gmx.at wrote: Besides, even though linked data URIs and federated queries are nice, it is quite practical to have all relevant datasets accessible through a single SPARQL endpoint. :-) I still find statements like this amusing, especially from the flagship organization representing the Semantic Web for healthcare and life sciences. I do understand *why* statements like this are made, and I do understand how much easier it is to demonstrate the Semantic Web if you remove Web from the equation... but it still irks me LOL! I second that :) Egon -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: [hcls] Updated wiki page for HCLS Knowledge Base
On Tue, Oct 13, 2009 at 5:14 PM, Matthias Samwald samw...@gmx.at wrote: Back in 2006 in the Creeps paper, Ben and I wrote that the SWHCLS community has spent too much time focusing on Semantic rather than on Web... and it really hasn't changed much nearly 4 years later :-( I honestly think that, if we all pulled in the same direction, we could make the Web aspect of the Semantic Web work better than it currently does... This is quite dissonant with the impressions that I have. The Linked Data paradigm is very popular, and a lot of work in this area (as well as query federation) is going on at the moment. What exactly are you missing in the work that is currently going on? You wrote: Besides, even though linked data URIs and federated queries are nice, it is quite practical to have all relevant datasets accessible through a single SPARQL endpoint. Linked Data focuses on crawling the web. At least, that's the impression I have... yet, a single store to query is indeed much more convenient... it's sort of contradicting: I can appreciate Mark's comment, as we have yet to come up with good solutions for when to stop crawling and start analyzing the data... the crawling is iterative, and each new analysis step may trigger new queries (and probably should)... A single federated query is not what I expect to be the final solution; instead, I expect an iterative process, where possible steps may be federated, but iterative nevertheless... Having one single SPARQL end point indicates the crawling is done, where we have only just started linking things together... that said, I also don't think the final SPARQL end point should be remote at all, and it steps over the current data licensing issues we still (unfortunately) have to deal with... And then I just think about remote services calculating things on the fly, which will likely not be part of a SPARQL end point anyway... Mark, or perhaps things like SADI should have a SPARQL interface? :) Anyways... looking forward to meeting some/many of you in Amsterdam! Egon -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Linking Open Drug Data wins First Prize of Triplification Challenge
Hi Anja! On Sun, Sep 6, 2009 at 12:05 AM, Anja Jentzscha...@anjeve.de wrote: the Linking Open Drug Data Task Force just won the first prize of the Linking Open Data Triplification Challenge [1] which took place at the I-Semantics in Graz. The paper we submitted can be found online [2] as well as the talk [3]. First of all, congrats with the win! Thanks to the LODD group for all the hard work and commitment to the project. It is a pleasure working with you! I have a question about the title of the email... it is now entering the social web as title too, and was wondering about the Open Data nature of the data... the wiki page [5] does not provide any license information or copyright statement, or any other claim about the users rights to modify (extend, fix typos, ...) and redistribute, two rather important aspects of Open Data [5]. In particular, I was not aware that the DrugBank data actually was Open. That said, I am not entirely sure how the LODD name of the task force came about and whether it actually attempts to identify itself with Open Data, or merely downloadable data. Can you please elaborate on these issues? Egon 4.http://esw.w3.org/topic/HCLSIG/LODD/Data 5.http://en.wikipedia.org/wiki/Open_Data -- Post-doc @ Uppsala University Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers
Re: Can RDFa be used on XML: pharma information
On Tue, Jun 23, 2009 at 11:20 AM, Rick Jellifferjelli...@allette.com.au wrote: I am working on improving the semweb markup on an Australian government Department of Health and Aging website, which has HTML and XML versions of the medicines allowed for prescription and the amount the government pays. It has various links to interesting documents, and we want to make it more semweb friendly. Here are two example pages to give you the idea (they have different selections of data): http://www.pbs.gov.au/html/consumer/search/results?term=Zyprexa%20Zydispublication=GE http://www.pbs.gov.au/xml/consumer/search/results?term=Zyprexa%20Zydispublication=GE We are doing some general things like improving the microformats (DC and hproduct) in the HTML. But the plan was to decorate the XML (which has extra information) with the appropriate RDFa, which seems perfect. But now I see that the RDFa spec says that RDFa is designed for use on XHTML. We do no want to use it that way, we want to augment the XML. So I was wondering if anyone here had any advice? I see the choices Instead of the XML end point, I would express all that content as RDF (possibly in the XML format). If you need the XML for the metadata info on the request, you could consider putting a RDF element somewhere in your custom XML. Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: Can RDFa be used on XML: pharma information
On Tue, Jun 23, 2009 at 11:49 AM, Rick Jellifferjelli...@allette.com.au wrote: So there is still no convenient way to mark up existing XML as RDF? It was a showstopper 10 years ago but I kind of expected there would have been some progresssigh Define 'markup'... you can just embed your RDF in your XML, using RDF/XML... the namespacing is the indication what is RDF and what is not... no other 'markup' needed... Can you elaborate on the inconveniences you talk about a bit more? That makes providing solutions easier... Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: Can RDFa be used on XML: pharma information
On Tue, Jun 23, 2009 at 12:15 PM, Rick Jellifferjelli...@allette.com.au wrote: Markup = annotation. Taking existing data and adding stuff to make it more useful, without disrupting existing uses of that data (and without creating the size/maintenance issues you get from duplication.) One of the rationales for this project is to make more effective use of bandwidth, which makes me lean against duplication somewhat, but it may indeed be the appropriate way. OK, so the requirement is to: 1. stick with the current XML, 2. provide RDF/XML. I think XSLT route proposed by others is the way to go then, making a third end point, which would take the current XML as input, convert it with XSLT to RDF/XML. Using RDF/XML has the advantage here that you can validate your XSLT stylesheet for the output content too, increasing your changes of detecting typos etc. Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: Can RDFa be used on XML: pharma information
On Tue, Jun 23, 2009 at 2:48 PM, Rick Jellifferjelli...@allette.com.au wrote: I see that the 2008 draft http://www.w3.org/2006/07/SWD/RDFa/rdfa-overview says RDFa itself is intended to be a technique that allows for adding metadata to any (XML) markup document, including SMIL, RSS, SVG, MathML, etc. Note, however, that in the current state, RDFa is being defined only for the (X)HTML family of languages. So I think I will go ahead and add some RDFa markup to the XML, so that there is some data on the web which might stimulate developers or inform them, and tell the client that we may need to change tack. The problem here is to define what attributes your XML will use to define the RDFa hooks... what attributes will define a new subject, the predicate, and how you define the object... Because the XML is using a local namespace, it will be unrecognizable for any client... however, given you define those attributes (or via new elements), you should be able to embed this RDFa in the HTML more easily too... Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
drug side effects
Just was told about this: http://sideeffects.embl.de/ Rather permissively licensed. SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations. Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: HCLS Telcon
Hi Scott (and others), On Thu, Apr 2, 2009 at 12:42 AM, M. Scott Marshall marsh...@science.uva.nl wrote: Here's the reminder for Thursday's HCLS call. I will not be able to make it today. And I think I found which telcon's are most interesting to me (LODD and BioRDF), though several others have my interest too. Last two weeks I have been working from home a bit more, with no option of dialing in, so attending as much as possible from IRC. However, I found it difficult to use this mechanism to report on the things I have been up to... So, I was wondering, what would be the best way to do this? Just send an email to this list, and mention that: - I converted NMRShiftDB.org into RDF, with NMR spectra for small, drug-like molecules - linked to that, Bio2RDF and to ChemSpider from rdf.openmolecules.net - we are working on converting the StARLite DB into RDF, which holds drug-assay-protein-proteinClass relations (though that will not get online before summer) - extending Bioclipse with RDF support using Jena, to allow visualization of structural data, e.g. 3D protein structures, ligands in 2D/3D and just ask for feedback? (Thanx to those who already gave feedback via other channels! And thanx to Kingsley for using the rdf.openmolecules.net data in one of his applications.) For the rest, I have the following on my todo list: * write up my views on similarity of molecules in the RDF world * write up what I think of SKOS with respect to OWL, and how we used SKOS in MetWare for those reasons * write up what my impresseion is of SWAN It's been grant hunting season here, so been delayed with the above things. Sorry about that. Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: blog: semantic dissonance in uniprot
On Thu, Apr 2, 2009 at 5:35 PM, Michel_Dumontier michel_dumont...@carleton.ca wrote: Actually, I'd say OWL is to blame here... that is, the OWL class was not properly defined. Just to clarify - it's not OWL that's the problem. It's the representation of Chemistry in a formal logic-based language where it actually matters what you say and how you say it. Yeah, sorry, I knew I had to phrase that more correctly... it's not the OWL standard, but whatever had been defined using OWL. These things are pretty tricky, and if you read the IUPAC Gold Book on definitions, it will not get much clearer either; there will be plenty of use of owl:sameAs and all alternatives that define more loose similarity to capture current terms... Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: Announcement: Bio2RDF 0.3 released
On Mon, Mar 23, 2009 at 12:09 AM, Peter Ansell ansell.pe...@gmail.com wrote: 2009/3/22 Egon Willighagen egon.willigha...@gmail.com: On Sun, Mar 22, 2009 at 1:42 AM, Peter Ansell ansell.pe...@gmail.com wrote: Do you also provide InChIKey resolution? No. That requires look up, so only works against an existing database. Chemspider is doing this, but is not a general solution. InChIKey's are not unique, though clashes rare, and not observed so far. I didn't think it required a lookup to derive an InChIKey given an InChI. Ah, sorry. InChIKey can be computed, but I thought you meant resolving what structure has a given InChIKey... going from InChIKey to structure does require lookup, generation from InChIKey from structure (or InChI) does not. I realise that clashes are rare but possible, just wondering whether it would be supported. Leaving them out altogether just seems like missing possibly extra information. I'll add them where missing. [1] It is just that InChI's can get pretty long for complex molecules and it makes it harder for people to accurately copy and paste them around when needed. Indeed. However, InChIKey is less precise. RDF allowing us to be do things in an exact manner, I rather use InChI. InChiKey's might be better for general use in RDF because they have a guaranteed identifier length and therefore won't become cumbersome for complex molecules. But can never be used for owl:sameAs like relations. Having them as properties could give someone a quick clue as to whether they are looking at the same molecule. Humans do interact with RDF (inevitably), and having short hash values can still be valuable. Given that hashes are usually designed to amplify small changes, it is easier than reading a 10 line InChiKey to determine whether there was a difference. Agreed. Currently all of the InChI's that I have seen have been as Literals, but it would be relatively easy to also provide them as URI's to provide the link since you have a resolver for them set up. That was precisely the reason why I started the service. Good work. Thanx for the feedback! Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: Announcement: Bio2RDF 0.3 released
Hi Kei, On Mon, Mar 23, 2009 at 3:37 PM, Kei Cheung kei.che...@yale.edu wrote: As part of the biordf query federation task, we are currently exploring a federation scenario involving integration of neuroreceptor-related information. For example, IUPHAR provides information for different classes of receptors. For example, in the table shown at http://www.iuphar-db.org/GPCR/ReceptorListForward?class=class%20A, ligands are provided for receptors but not InChI codes ... That's an interesting table... not Open it seems... did you ask permission (and get) permission to redistribute under a free license, perhaps? The list is not overly long, and InChIs could be added manually, though one would have to assume the compound names (btw, some are compound classses!) are unique... PubChem also has links to MeSH terms, and I also see a MeSH term in the ChemBox on WikiPedia... that would be open data, and could provide similary information. I have been pondering about setting up open source semantic wiki to linking data, where there is no Open source for that available, but have not had time for that yet. Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: blog: semantic dissonance in uniprot
On Sat, Mar 21, 2009 at 5:01 AM, eric neumann ekneum...@gmail.com wrote: There is no such thing as a referenceble instance of a specific instantiated molecule (that specific molecule); all gene, protein, and chemical records are about the category or group of exemplar molecules: SAME molecular structure, NOT SAME atoms (so we already aren't really things in the real world ;-) ); all molecular databases are based on this asserted fact. Even worse. Since there are 10^20 molecules in most used materials, many 'molecular' properties are really material properties. A melting point is not a molecular property, but often even reported as elemental property. Most users of molecular information aren't ignorant about the difference between a protein and a record of a protein; it's just that they don't want to deal with all the extra CS mechanics (that prevent getting their job done). And so an instance of a protein record in a database (or a reference to it from another database) is the closest thing to saying: here's the protein. Chemists are not interested in single molecules (well, most are not, but with increasing nanotechnology...). I was told recently that upper ontologies have proper mechanisms to point out the difference between (in Java terminology) objects and classes, or instances and concepts. Different records exist for the same protein, which indeed has been a historic point of complication; but this is really a social issue, not a semantic one, and the key data authorities have already for years coordinated on this point by supplying cross-references to each other. There is another level to this: that of a measurement or observation, and the identity we assign to it. The sequence of a protein, or molecular structure of a drug of the model that people assigned to some measurement. Measurements that point to the same measurable, may actually be assigned different identities... Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: Announcement: Bio2RDF 0.3 released
Hi Peter, On Fri, Mar 20, 2009 at 7:56 AM, Peter Ansell ansell.pe...@gmail.com wrote: * Some http://database.bio2rdf.org/database:identifier URI's are given by this, but these aren't standard, and are only shown where there is still at least one SPARQL endpoint available which uses them. People should utilise the http://bio2rdf.org/database:identifier versions when linking to Bio2RDF. I'm using ChEBI IDs right now to link to your RDF with owl:sameAs: http://rdf.openmolecules.net/?InChI=1/C12H8O2S/c13-8-5-6-10-11(12(8)14)7-3-1-2-4-9(7)15-10/h1-6,13-14H Linking back to rdf.openmolecules.net can be done as shown above with the InChI. I'll hook up to your DrugBank and DBPedia later today. Do you already make links between ChEBI and DBPedia? I created links by converting SMILES into InChIs: http://chem-bla-ics.blogspot.com/2009/02/dbpedia-enters-rdfopenmoleculesnet.html Comments most welcome! Egon -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: Introduction(s) to HCLS IG
Hi all, On Fri, Mar 6, 2009 at 8:27 PM, M. Scott Marshall marsh...@science.uva.nl wrote: Several new people have joined HCLS IG http://www.w3.org/2001/sw/hcls/ lately. Welcome! We have a tradition of sending an Introduction so the mailing list to help participants get to know each other and find common interests. Would those of you who haven't yet done so please send an introduction to the list? my apologies for not having sent around an introduction on who I am and what I do before, but here goes. My name is Egon Willighagen and currently post-doc at Uppsala University, working on applying cheminformatics in drug discovery. I am one of the lead developers of the (open source) Chemistry Development Kit, and got a PhD (2008) in representation of molecular systems in light of data analysis, which involves distribution of data too, which explains my long standing interest in semantic markup of molecular data, such as Chemical Markup Language. My blog is a good resource of what I have been doing in general, http://chem-bla-ics.blogspot.com/, or otherwise my publications, http://www.citeulike.org/user/egonw/tag/papers. Currently I am working on setting up RDF for small molecules, with the InChI as central identifier: http://rdf.openmolecules.net/ In addition to this I am extending Bioclipse with RDF support (see my blog), which allows visualization of molecular data in 2D/3D and use it for model building and pattern recognition. In the past year I worked at Wageningen University, where I worked on a SKOS-based ontology for metabolomics, for which the software under development in an international consortium is available from http://metware.org/. Hoping this was informative, Egon Willighagen -- Post-doc @ Uppsala University http://chem-bla-ics.blogspot.com/
Re: C-SHALS 2009
On Fri, Jan 23, 2009 at 12:33 AM, Susie M Stephens stephens_susi...@lilly.com wrote: I wanted to give everyone a heads up about the C-SHALS Conference [1]. Thanx for the pointer. Is there an overview of conferences of interest for people on this list, new-comers like me in particular? (Is there any common 'tag' used in social bookmarking (etc) used by anyone?) For myself, I am looking for something in Aug-Dec 2009... Thanx, Egon -- http://chem-bla-ics.blogspot.com/
Re: Overview of conferences (like C-SHALS)
On Fri, Jan 23, 2009 at 2:35 PM, Kei Cheung kei.che...@yale.edu wrote: You might also want to give del.cio.us (http://delicious.com/) a try. Yes, agreed. I have been using this for quite some time now. I'll start using the tags: hclsig, lodd and others, for things I find relevant to this list. With that in mind, and FriendFeed users around? http://friendfeed.com/rooms/hclsig? Egon -- http://chem-bla-ics.blogspot.com/
Re: Pharma Ontology Telcon Minutes
On Fri, Jan 23, 2009 at 5:42 PM, Susie M Stephens stephens_susi...@lilly.com wrote: The minutes are now available from yesterday's Pharma Ontology call [1]. Thanks very much to Christi for scribing. The chatting between Elgar and Scott M at the end of the IRC transcript would actually not of Elgar, but of Egon (me). Egon -- http://chem-bla-ics.blogspot.com/
Re: RDF for molecules, using InChI
Hi all, I have not applied all suggestions sent my people on the list, but wanted to give a short update. So, no RDF document pointing to all molecules with non-trivial RDF statements, and no RDF-based definitions of the properties used. Apologies for that. On 8/2/07, Egon Willighagen [EMAIL PROTECTED] wrote: I played a bit with RDF for molecular data a bit this week, and now have a RDF provider service (try methane [1]), which is written in PHP, uses XSLT to create a HTML frontend (*). It works for any molecule/InChI, but depends on 'plugins' to set up any other than the implied properties (i.e. reproduce the InChI). I have added a new module that extracts 'tags' for molecules [1], and am quite happy with the setup. It is using the rdf.openmolecules.net URL identifier, which can be added to Connotea and tagged, like any journal article or website. The website uses the Connotea API to convert these tags into RDF properties for the InChIs. In the blog item, I give some applications of the tagging, like defined sets of molecules etc. SPARQL would be rather suited to extract such sets from the RDF statements. Again, comments most welcome. Egon PS. don't want to stir up the URL-versus-URI discussion. Tagging molecules has just been on my wish list for some time, and this seems to work well. 1.http://chem-bla-ics.blogspot.com/2007/09/tagging-molecules-mashup-of-connotea.html -- http://chem-bla-ics.blogspot.com/
Re: RDF for molecules, using InChI
Eric, On 8/18/07, Egon Willighagen [EMAIL PROTECTED] wrote: On 8/17/07, Eric Neumann [EMAIL PROTECTED] wrote: Thanks for the pointer-- is there a list of all the molecules you store something about? Not at this moment. That would be a rather lengthy RDF doc. The number of molecules of which something is know is currently in the order of 10M. I have not taken up the challenge of hosting that. Actually, I could make a list of some 250 molecules [1]. Should I make one RDF file listing all triples for all molecules, or make one master file, which points to the current RDF 'files'? Egon 1.http://cb.openmolecules.net/inchis.php -- http://chem-bla-ics.blogspot.com/
Re: RDF for molecules, using InChI
Hi Eric, On 8/17/07, Eric Neumann [EMAIL PROTECTED] wrote: Thanks for the pointer-- is there a list of all the molecules you store something about? Not at this moment. That would be a rather lengthy RDF doc. The number of molecules of which something is know is currently in the order of 10M. I have not taken up the challenge of hosting that. I've noticed your server will handle any InchI string it receives, though CIDs and other annotations will not be returned. Correct. I looks up information from other sources, so the 'service' is more like a relay or aggregator than a database. Since one can determine the molecular weight from InchI, would it make sense to include such a feature? Yes. One thing I am interesting in, is using SPARQL to find incorrect data in databases. And it happens to occur that databases show one InChI (e.g. of salts), but derive properties of only one fragment... that will show up in the MW. Finally, I see you point to Pubchem CIDs and Pubchem uses InchIs as well, so is there any way to include all compounds PC refers to as well? Yes, I do plan to write a relay for PubChem. I will make these scripts open source asap, but been busy with project reports and grant proposals the last two weeks. Egon -- http://chem-bla-ics.blogspot.com/
Re: RDF for molecules, using InChI
Hi all, On 8/2/07, Egon Willighagen [EMAIL PROTECTED] wrote: 1.http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4 a quick update, rdf.openmolecules.net is now online, so the above URL becomes: http://rdf.openmolecules.net/?InChI=1/CH4/h1H4 Egon http://chem-bla-ics.blogspot.com/
RDF for molecules, using InChI
Hi all, I played a bit with RDF for molecular data a bit this week, and now have a RDF provider service (try methane [1]), which is written in PHP, uses XSLT to create a HTML frontend (*). It works for any molecule/InChI, but depends on 'plugins' to set up any other than the implied properties (i.e. reproduce the InChI). The methane example mentioned shows some information extracted from Chemical blogspace [2], but I plan to write other plugins too, e.g. for PubChem, ChemSpider and other databases. I have written up some thoughs at [3], and would much like to hear your opinions and comments. Looking forward to hearing from you, kind regards, Egon Willighagen http://chem-bla-ics.blogspot.com/ *) FireFox 2.0.0.6 and IE pick up the declared stylesheet, but Konqueror/Linux does not. 1.http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4 2.http://cb.openmolecules.net/ 3.http://chem-bla-ics.blogspot.com/2007/07/rdf-ing-molecular-space.html