Re: [Rdkit-discuss] Beta of Q2 2011 Release Available
Hi Greg, windows binary (py27, please : ) ) It's up on the google download page; hopefully I remembered all the DLLs this time. :-S -greg The binary works a treat - no sign of missing DLLs - thanks! __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the Company address and registration details link at the bottom of the page.. __ -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using SQLAlchemy with the RDKit database cartridge
Hi Adrian, On Mon, Jul 4, 2011 at 11:22 AM, Adrian Schreyer ams...@cam.ac.uk wrote: Hi Riccardo, are you planning on supporting other cartridges/database dialects? If you only want to include RDKit/PostgreSQL then your implementation might be much more of what you actually need :) yes, at present I'm focusing almost exclusively on RDKit/PostgreSQL, but the longer term goal includes supporting additional backends. I'm particularly interested in the RDKit's cartridge, and I plan to implement full support for it, but you are correct, if that were the only supported cartridge/database this kind of approach could be considered overkill. If you want to change the similarity thresholds you will need something like session.execute(text(SET rdkit.tanimoto_threshold=:threshold).execution_options(autocommit=True).params(threshold=threshold)), probably wrapped inside a function. Important is that you use execution_options(autocommit=True) because SQLAlchemy won't autocommit SET operations (if you set it in the engine config). Useful hint, thanks. I'm still learning about SQLAlchemy and I would have probably missed the proper management of the autocommit flag. Btw, I actually postponed the management of the threshold values because of some technical details that are unclear to me and I should have asked about anyway. More specifically, I'm not sure about the scope associated to these global parameters. The question is probably for Greg, but do these values hold for the whole server, or for the given database, or for the specific database connection? I also have rdkit in my database api but I went for the hybrid approach in SQLAlchemy that allows you to distinguish between methods on the class and instance level. With an instance of an rdkit molecule for example, the api will use the local rdkit installation for a substructure pattern match. On the class level however, the same expression is turned into an SQL expression to query the database. I also use the @reconstructor decorator to turn the database rdmol smiles string back in to a Python RDMol but this is only useful if you plan on using RDKit on the client side as well. Currently I'm making no assumptions on the toolkit available on the client side, I had quickly read the documentation related to the hybrid approach and I wasn't very convinced it was what I was looking for, but from your examples I would say I should reconsider that, at least in part. Thanks a lot for your reply and suggestions, Cheers, Riccardo # instance of ChemCompRDMol print sti.RDMol.contains('c1c1') True # class itself print ChemCompRDMol.contains(sti.ism) pdbchem.chem_comp_rdmols.rdmol OPERATOR(rdkit.@) :rdmol_1 Here is an example: @reconstructor def init_on_load(self): ''' Turns the rdmol column that is returned as a SMILES string back into an RDMol object. ''' self.rdmol = MolFromSmiles(self.rdmol) @hybrid_method def contains(self, smiles): ''' ''' return self.rdmol.HasSubstructMatch(MolFromSmiles(smiles)) @contains.expression def contains(self, smiles): ''' ''' return self.rdmol.op('OPERATOR(rdkit.@)')(smiles) and that's basically it. I have the cartridge installed in it's own schema, that's why I need the OPERATOR() syntax. Cheers, Adrian On Fri, Jul 1, 2011 at 17:22, Riccardo Vianello riccardo.viane...@gmail.com wrote: Hi all, I've started working on an extension of the SQLAlchemy database toolkit that is aimed to support direct access from python to the functions and data types exposed by the database chemical cartridge. In brief this means that instead of interacting with the RDBMS using raw SQL queries, it may become possible to execute the entire workflow (data preprocessing and cleanup, insertion, selection and further processing) without leaving the python interpreter, and at the same time delegating the construction of the required SQL expressions to a higher-level API. Just to make a simple example, instead of using select count(*) from molecules where structure @ 'O=C1OC2=CC=CC=C2C=C1'; one might type something like the following: constraint = Molecule.structure.contains('O=C1OC2=CC=CC=C2C=C1') print session.query(Molecule).filter(constraint).count() (ok, in this specific case the python expression is a bit more verbose, but it's a very simple SQL query :-) The project is still in an initial phase, and the code is far from being mature, but the development is currently strongly focused on the RDKit postgresql extension. Structure searches and molecular descriptors should be fully supported, and bit fingerprints and associated similarity operators are also available (but modifying the default threshold similarity values is not yet possible). The code is currently hosted on github https://github.com/rvianello/razi and some draft documentation (at the moment mainly intended to illustrate the idea
[Rdkit-discuss] SDF with errors
Is there any obvious reason why the following 8 molecules in the attached sdf file give a bunch of kekulization errors ? Is it something to do with these lines - M CHG 2 4 1 13 -1 M STY 1 1 DAT M SAL 1 1 7 M SDT 1 MRV_IMPLICIT_H M SDD 1 0.0.DRALL 0 0 M SED 1 IMPL_H1 (using the now old 2010_12_1) MolPort-000-00_Sold_Out_Errors.sdf Description: Binary data -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SDF with errors
On Mon, Jul 4, 2011 at 5:49 PM, JP jeanpaul.ebe...@inhibox.com wrote: Is there any obvious reason why the following 8 molecules in the attached sdf file give a bunch of kekulization errors ? Is it something to do with these lines - M CHG 2 4 1 13 -1 M STY 1 1 DAT M SAL 1 1 7 M SDT 1 MRV_IMPLICIT_H M SDD 1 0. 0. DR ALL 0 0 M SED 1 IMPL_H1 (using the now old 2010_12_1) Nope. Those molecules all contain bonds with bond-order set to aromatic and nitrogen-containing aromatic heterocycles where one (normally arbitrary) N needs an explicit H to make the ring aromatic. There are two problems here: 1) aromatic bond orders really shouldn't be used in SD files that aren't for query molecules. 2) The RDKit cannot figure out which N should have the H attached and, as is typical of the RDKit, doesn't try to guess. A method for randomly picking a tautomer that can be kekulized is described in these threads: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01162.html http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01185.html Either fixing the SDF or running the random tautomer generator shoudl work. -greg -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using SQLAlchemy with the RDKit database cartridge
On Mon, Jul 4, 2011 at 3:15 PM, Riccardo Vianello riccardo.viane...@gmail.com wrote: Useful hint, thanks. I'm still learning about SQLAlchemy and I would have probably missed the proper management of the autocommit flag. Btw, I actually postponed the management of the threshold values because of some technical details that are unclear to me and I should have asked about anyway. More specifically, I'm not sure about the scope associated to these global parameters. The question is probably for Greg, but do these values hold for the whole server, or for the given database, or for the specific database connection? The setting applies to the current connection. -greg -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss