Re: [Rdkit-discuss] Beta of RDKit knime nodes available

2010-11-26 Thread James Davidson

Hi Greg and Thorsten,


> Greg:
>
>> Thorsten:
>> On the other hand, 4000 rows should not take that long in KNIME. How
>> much times does it currently take?
>
> I just did 1000 rows on my macbook. Assuming I'm reading the knime log
> correctly, that took about a minute.


Thanks for testing this out, Greg.  I must confess, I didn't wait for
the hierarchical clustering to finish for the 4000!  Going back and
selecting a random 1000 molecule subset, I reproduce your result of ~ 1
min (I get 67 secs).  If I then go to 2000, it takes 520 secs - so to me
this looks like cubic complexity - which is what the documentation for
the node states (this would mean > 1 hr for my original 4000...)

For completeness - this result was with the Hierarchical
Clustering(DistMatrix) node set with 'Tanimoto' similarity and 'Complete
Linkage' for cluster comparison.  Changing the comparison to 'Single
Linkage' did not reduce the time.

Interestingly, the documentation for the 'standard' Hierarchical
Clustering' (ie non-distance matrix) node states that it operates with
"n-squared complexity".  I guess other clustering algorithms available
in knime must scale better than cubicly as well (k-means, fuzzy
c-means?) - but as far as I can see they don't currently operate on
distance matrices (or directly on bit vectors).  If they could, then
this may be a solution; or implementing the Murtagh algorithm (I am
guessing the scaling is below cubic from my recollection of the speeds
observed in rdkit).

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Handling certain sterochemistry in reactions

2010-11-26 Thread Greg Landrum
James,

On Fri, Nov 26, 2010 at 5:28 PM, James Davidson  wrote:
>
> I wonder if anybody can help with the following?  I am trying to figure-out
> how to handle double-bond stereochemistry in reactions when the
> stereochemistry is involved with the making / breaking bond.  Hopefully this
> example will explain better than that sentence(!):
>
> rxn = AllChem.ReactionFromSmarts('[c:1][Cl,Br,I].[#6:2][B]>>[*:1][*:2]')
> mol1 = Chem.MolFromSmiles('c1c1Br')
> mol2 = Chem.MolFromSmiles('C\C=C\B(O)O')
> ps = rxn.RunReactants((mol1, mol2))
> Chem.MolToSmiles(ps[0][0], True)
>
> --->  'CC=Cc1c1' (stereochemical information lost)
>
> whereas using mol2 = Chem.MolFromSmiles('C\C=C\c1c1B(O)O') gives
>
> --->  'C/C=C/c1c(-c2c2)1' (stereochemical information retained)

yeah. Here's an even shorter illustration of correct behavior:

In [11]: mol2 = Chem.MolFromSmiles(r'C\C=C\CB(O)O')

In [12]: ps = rxn.RunReactants((mol1, mol2))

In [13]: Chem.MolToSmiles(ps[0][0], True)
Out[13]: 'C/C=C/Cc1c1'

As you picked up already, the problems start when you are making
changes that directly affect an atom/bond that's involved in
stereochemistry. This is a bug.

Here's another illustration of the same thing with atomic stereochemistry:
In [15]: rxn = AllChem.ReactionFromSmarts('[C:1]Cl>>[*:1]F')

In [19]: m=Chem.MolFromSmiles('cl...@](Cl)(Br)I')

In [20]: ps = rxn.RunReactants((m,))

In [21]: Chem.MolToSmiles(ps[0][0], True)
Out[21]: 'f...@](Cl)(Br)I'

In [22]: Chem.MolToSmiles(ps[1][0], True)
Out[22]: 'FC(Br)(I)CCl'

Notice that the stereochemistry is fine in the first case (line 21)
since the reaction didn't affect the atom that is tagged, but that
stereochemistry is lost when the tagged atom is modified (line 22).

> Having got to the end of explaining that, I am thinking that the way I
> should handle this is to check for 'problem' reactants and pass to a more
> specific rSMARTS when required!

I don't think that will help. Unfortunately the problem is in the way
stereochemical information is handled in the reaction code. My gut
feeling is that this is not going to be a quick one to fix, but I will
take a look.

Best Regards,
-greg

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Postgres cartridge '=' operator

2010-11-26 Thread Greg Landrum
Hi,

On Fri, Nov 26, 2010 at 6:04 PM, gia...@gmail.com  wrote:
> I'm playing with the Postgres cartridge and built a small database to
> make some tests. I see from the wiki pages I can do substructures and
> superstructures search with the @> and <@ operators but there's no
> mention of a "=" operator.
>
> The question is, what if I want to search the exact input structure in
> my DB? is '=' implemented and just not documented or should I use one
> of the other operators to do the search?

At the moment there isn't a particularly satisfying way of doing an
equality search aside from adding a smiles column to the database and
just doing a straight equality search on that.
To that end it's probably useful to know that the smiles generated by
the cartridge when you convert a molecule to text is canonical. So one
way to get canonical smiles to query with is:
select 'CC(=O)c1ccc2c(C(=O)C(=O)N2C)c1'::mol::text;

Without adding the smiles column, another option that should be
correct, though it's somewhat ugly, is:
select * from mols where m<@'CC(=O)c1ccc2c(c1)C(=O)C(=O)N2C' and
m@>'CC(=O)c1ccc2c(c1)C(=O)C(=O)N2' and
m::text='CC(=O)c1ccc2c(c1)C(=O)C(=O)N2C'::mol::text;

If the molecule column is indexed, this will use the index so it's
actually reasonably efficient. If you don't care about stereochemistry
you can leave the last bit (SMILES comparison) out.

Having a less ugly way of doing equality querying would be useful;
that would be a good feature request.

-greg

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Postgres cartridge '=' operator

2010-11-26 Thread gia...@gmail.com
I'm playing with the Postgres cartridge and built a small database to
make some tests. I see from the wiki pages I can do substructures and
superstructures search with the @> and <@ operators but there's no
mention of a "=" operator.

The question is, what if I want to search the exact input structure in
my DB? is '=' implemented and just not documented or should I use one
of the other operators to do the search?

-- 
Gianluca Sforna

http://morefedora.blogspot.com
http://identi.ca/giallu - http://twitter.com/giallu

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Handling certain sterochemistry in reactions

2010-11-26 Thread James Davidson
Dear All,
 
I wonder if anybody can help with the following?  I am trying to
figure-out how to handle double-bond stereochemistry in reactions when
the stereochemistry is involved with the making / breaking bond.
Hopefully this example will explain better than that sentence(!):
 
rxn = AllChem.ReactionFromSmarts('[c:1][Cl,Br,I].[#6:2][B]>>[*:1][*:2]')
mol1 = Chem.MolFromSmiles('c1c1Br')
mol2 = Chem.MolFromSmiles('C\C=C\B(O)O')
ps = rxn.RunReactants((mol1, mol2))
Chem.MolToSmiles(ps[0][0], True)
 
--->  'CC=Cc1c1' (stereochemical information lost)
 
whereas using mol2 = Chem.MolFromSmiles('C\C=C\c1c1B(O)O') gives
 
--->  'C/C=C/c1c(-c2c2)1' (stereochemical information retained)
 
Not quite the same, but I have read through some related SMIRKS info
here: http://www.daylight.com/dayhtml/doc/theory/theory.smirks.html
 .
However, this explains how to handle stereo centres and stereo bonds in
reactions when they are explicitly defined on both sides of the
reaction.  I guess what I am looking for is a shortcut for saying
'retain' or 'invert' stereochemistry at reacting centre (sp3) or bond
attached to reacting centre (sp2)...
 
Having got to the end of explaining that, I am thinking that the way I
should handle this is to check for 'problem' reactants and pass to a
more specific rSMARTS when required!
 
Kind regards
 
James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RPM packages for Fedora

2010-11-26 Thread gia...@gmail.com
On Fri, Nov 26, 2010 at 5:53 AM, Greg Landrum  wrote:
> Yesterday and this morning I was able to install and (briefly) test
> the 64bit rpms on a clean Fedora 14 image. Everything worked without
> problems... this is very cool. I'm particularly pleased that the
> cartridge is there. :-)

Yeah. I'm playing with it at work; very, very cool stuff :-)

>
> One thing I noticed while poking around: it looks like some CMake
> residues from the build and testing process made it into the Projects
> directory.

Well, I just pick up and package everything that "make install" moves
to the buildroot. We probably need to check what cmake is doing there


-- 
Gianluca Sforna

http://morefedora.blogspot.com
http://identi.ca/giallu - http://twitter.com/giallu

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss