Re: [Rdkit-discuss] Beta of Q2 2011 Release Available

2011-07-04 Thread James Davidson
Hi Greg, 

  windows binary (py27, please  : )  )
 
 It's up on the google download page; hopefully I remembered 
 all the DLLs this time. :-S
 
 -greg


The binary works a treat - no sign of missing DLLs - thanks!

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using SQLAlchemy with the RDKit database cartridge

2011-07-04 Thread Riccardo Vianello
Hi Adrian,

On Mon, Jul 4, 2011 at 11:22 AM, Adrian Schreyer ams...@cam.ac.uk wrote:
 Hi Riccardo,

 are you planning on supporting other cartridges/database dialects? If
 you only want to include RDKit/PostgreSQL then your implementation
 might be much more of what you actually need :)

yes, at present I'm focusing almost exclusively on RDKit/PostgreSQL,
but the longer term goal includes supporting additional backends. I'm
particularly interested in the RDKit's cartridge, and I plan to
implement full support for it, but you are correct, if that were the
only supported cartridge/database this kind of approach could be
considered overkill.

 If you want to change the similarity thresholds you will need
 something like session.execute(text(SET
 rdkit.tanimoto_threshold=:threshold).execution_options(autocommit=True).params(threshold=threshold)),
 probably wrapped inside a function. Important is that you use
 execution_options(autocommit=True) because SQLAlchemy won't autocommit
 SET operations (if you set it in the engine config).

Useful hint, thanks. I'm still learning about SQLAlchemy and I would
have probably missed the proper management of the autocommit flag.
Btw, I actually postponed the management of the threshold values
because of some technical details that are unclear to me and I should
have asked about anyway. More specifically, I'm not sure about the
scope associated to these global parameters. The question is
probably for Greg, but do these values hold for the whole server, or
for the given database, or for the specific database connection?

 I also have rdkit in my database api but I went for the hybrid
 approach in SQLAlchemy that allows you to distinguish between methods
 on the class and instance level. With an instance of an rdkit molecule
 for example, the api will use the local rdkit installation for a
 substructure pattern match. On the class level however, the same
 expression is turned into an SQL expression to query the database. I
 also use the @reconstructor decorator to turn the database rdmol
 smiles string back in to a Python RDMol but this is only useful if you
 plan on using RDKit on the client side as well.

Currently I'm making no assumptions on the toolkit available on the
client side, I had quickly read the documentation related to the
hybrid approach and I wasn't very convinced it was what I was looking
for, but from your examples I would say I should reconsider that, at
least in part.

Thanks a lot for your reply and suggestions,

Cheers,
Riccardo


 # instance of ChemCompRDMol
 print sti.RDMol.contains('c1c1')
 True

 # class itself
 print ChemCompRDMol.contains(sti.ism)
 pdbchem.chem_comp_rdmols.rdmol OPERATOR(rdkit.@) :rdmol_1

 Here is an example:

   @reconstructor
   def init_on_load(self):
       '''
       Turns the rdmol column that is returned as a SMILES string back into an
       RDMol object.
       '''
       self.rdmol = MolFromSmiles(self.rdmol)

   @hybrid_method
   def contains(self, smiles):
       '''
       '''
       return self.rdmol.HasSubstructMatch(MolFromSmiles(smiles))

   @contains.expression
   def contains(self, smiles):
       '''
       '''
       return self.rdmol.op('OPERATOR(rdkit.@)')(smiles)

 and that's basically it. I have the cartridge installed in it's own
 schema, that's why I need the OPERATOR() syntax.

 Cheers,

 Adrian

 On Fri, Jul 1, 2011 at 17:22, Riccardo Vianello
 riccardo.viane...@gmail.com wrote:
 Hi all,

 I've started working on an extension of the SQLAlchemy database
 toolkit that is aimed to support direct access from python to the
 functions and data types exposed by the database chemical cartridge.
 In brief this means that instead of interacting with the RDBMS using
 raw SQL queries, it may become possible to execute the entire workflow
 (data preprocessing and cleanup, insertion, selection and further
 processing) without leaving the python interpreter, and at the same
 time delegating the construction of the required SQL expressions to a
 higher-level API. Just to make a simple example, instead of using

 select count(*) from molecules where structure @ 'O=C1OC2=CC=CC=C2C=C1';

 one might type something like the following:

 constraint = Molecule.structure.contains('O=C1OC2=CC=CC=C2C=C1')
 print session.query(Molecule).filter(constraint).count()

 (ok, in this specific case the python expression is a bit more
 verbose, but it's a very simple SQL query :-)

 The project is still in an initial phase, and the code is far from
 being mature, but the development is currently strongly focused on the
 RDKit postgresql extension. Structure searches and molecular
 descriptors should be fully supported, and bit fingerprints and
 associated similarity operators are also available (but modifying the
 default threshold similarity values is not yet possible). The code is
 currently hosted on github

 https://github.com/rvianello/razi

 and some draft documentation (at the moment mainly intended to
 illustrate the idea 

[Rdkit-discuss] SDF with errors

2011-07-04 Thread JP
Is there any obvious reason why the following 8 molecules in the attached
sdf file give a bunch of kekulization errors ?

Is it something to do with these lines -

M  CHG  2   4   1  13  -1
M  STY  1   1 DAT
M  SAL   1  1   7
M  SDT   1 MRV_IMPLICIT_H
M  SDD   1 0.0.DRALL  0   0
M  SED   1 IMPL_H1

(using the now old 2010_12_1)


MolPort-000-00_Sold_Out_Errors.sdf
Description: Binary data
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDF with errors

2011-07-04 Thread Greg Landrum
On Mon, Jul 4, 2011 at 5:49 PM, JP jeanpaul.ebe...@inhibox.com wrote:

 Is there any obvious reason why the following 8 molecules in the attached
 sdf file give a bunch of kekulization errors ?
 Is it something to do with these lines -
 M  CHG  2   4   1  13  -1
 M  STY  1   1 DAT
 M  SAL   1  1   7
 M  SDT   1 MRV_IMPLICIT_H
 M  SDD   1     0.    0.    DR    ALL  0       0
 M  SED   1 IMPL_H1
 (using the now old 2010_12_1)

Nope. Those molecules all contain bonds with bond-order set to
aromatic and nitrogen-containing aromatic heterocycles where one
(normally arbitrary) N needs an explicit H to make the ring aromatic.
There are two problems here:
1) aromatic bond orders really shouldn't be used in SD files that
aren't for query molecules.
2) The RDKit cannot figure out which N should have the H attached and,
as is typical of the RDKit, doesn't try to guess. A method for
randomly picking a tautomer that can be kekulized is described in
these threads:
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01162.html
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01185.html

 Either fixing the SDF or running the random tautomer generator shoudl work.

-greg

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using SQLAlchemy with the RDKit database cartridge

2011-07-04 Thread Greg Landrum
On Mon, Jul 4, 2011 at 3:15 PM, Riccardo Vianello
riccardo.viane...@gmail.com wrote:

 Useful hint, thanks. I'm still learning about SQLAlchemy and I would
 have probably missed the proper management of the autocommit flag.
 Btw, I actually postponed the management of the threshold values
 because of some technical details that are unclear to me and I should
 have asked about anyway. More specifically, I'm not sure about the
 scope associated to these global parameters. The question is
 probably for Greg, but do these values hold for the whole server, or
 for the given database, or for the specific database connection?

The setting applies to the current connection.

-greg

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss