What about this paper, Steve? It is based on the the one mentioned by Egon.
Interoperable chemical structure search service https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0367-2 Results We present a SPARQL service that augments existing semantic services by making interoperable substructure and similarity searches in small-molecule databases possible. The service thus offers new possibilities for querying interoperable databases, and simplifies writing of heterogeneous queries that include chemical-structure search terms. Regards, Cristian Cristian Bologa, Ph.D. Research Professor, Div. of Translational Informatics, Dept. of Internal Medicine, Univ. of New Mexico, School of Medicine, Innovation Discovery&Training Center, MSC09 5025, 700 Camino de Salud NE, Albuquerque, NM 87131 tel: +1 (505) 925-7534 fax:+1 (505) 925-7625 ---------------------- "If you never fail, it means you are not trying hard enough" ________________________________ From: Steve Vestal <steve.ves...@adventiumlabs.com> Sent: Thursday, December 3, 2020 4:08 AM To: Egon Willighagen Cc: BlueObelisk-Discuss Subject: Re: [BlueObelisk-discuss] Structure database that can be queried by SPARQL? [[-- External - this message has been sent from outside the University --]] Thanks, this was an interesting paper. I am in fact curious about the substructure search problem. I would appreciate a sanity-check on my understanding of this paper. My impression was that partially ordered fingerprints are used in an initial relational database comparison query to obtain a modestly sized set of candidate structures, after which a subgraph matching algorithm (e.g., a VF2 variant) is applied sequentially to each element of that set to get an exact answer. Is that the general approach? I got the vague impression the sequential subgraph matching, not the fingerprint comparison query, is the performance bottleneck -- is that generally true in this approach? To answer the earlier question, I am interested in seeing if graph database and description logic technologies can be applied to structure queries. To play around with that, I would want a true graph database representation of structure. I looked at ChEBI, like PubChem also available in RDF format, and like PubChemRDF also encodes structure using SMILES strings rather than RDF graphs. Does anyone know of any structure database that uses an attributed graph rather than string representation? Does anyone know of an open source software package that can convert SMILES strings into RDF (brass ring) or any sort of attributed graph data structure? What about open source tools to generate graphical visualizations from SMILES strings? I assume those would have this capability buried inside them. The CDK page cites a few export formats, SMILES, SDF, InChI, Mol2, CML, *and others*. Are any of the formats attributed graph data structures? On 12/2/2020 12:09 PM, Egon Willighagen wrote: Please have a look at: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0282-y On Tue, Dec 1, 2020 at 4:04 PM Steve Vestal <steve.ves...@adventiumlabs.com<mailto:steve.ves...@adventiumlabs.com>> wrote: Does anyone know of a structure database that can be queried using an RDF query language like SPARQL? PubChemRDF can be accessed in RDF format, but it encodes structures as SMILES strings, which cannot be queried in this way. If not, can anyone suggest open source software that might be used to construct a modest RDF dataset from an existing structure database for the purpose of experimenting? For example, software that can translate SMILES strings into an annotated graph data structure of some sort? Thanks in advance for any suggestions. _______________________________________________ Blueobelisk-discuss mailing list Blueobelisk-discuss@lists.sourceforge.net<mailto:Blueobelisk-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss -- Have you heard about Wikidata already? "Use Scholia and Wikidata to find scientific literature" is a new tutorial from my colleague Lauren Dupuis. https://laurendupuis.github.io/Scholia_tutorial/ ----- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: https://www.zotero.org/egonw ORCID: 0000-0001-7542-0286<http://orcid.org/0000-0001-7542-0286> ImpactStory: https://impactstory.org/u/egonwillighagen
_______________________________________________ Blueobelisk-discuss mailing list Blueobelisk-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss