[Rdkit-discuss] Number of Aromatic Rings

2010-06-10 Thread James Davidson
Dear Greg,
 
I have been trying figure-out how to return the count of aromatic rings
for molecules (in Python), and am going to have to admit defeat!  I saw
in an earlier message
(http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg0015
3.html) a similar query, but I'm afraid it didn't help me very much.  I
also read the section on Aromaticity in the rdkit book, and realised
that maybe this isn't a trivial exercise!
 
I would like the count to count aromatic ring-systems such that bicyclic
(eg indole or naphthalene) would only count as 1.  For reference, this
appears to be the behaviour of the OpenEye
OEDetermineAromaticRingSystems function - where the molecule derived
from the smiles "C(O)(=O)c12c1[nH]c(C3CCCc4c34)c2" (which
contains an indole and a tetrahydronaphthalene) gives a count of 2.
 
Any help would be greatly appreciated.
 
Thanks
 
James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Number of Aromatic Rings

2010-06-10 Thread Cedric MORETTI
I propose something; it's not a true approach
If you've a aromatic you've a smile c1c1 for example.
So if you count the number of "1" and divide by 2 you're the number of aromatic
If you've lot of cycle in your structure, you're count the number of 1, the 
number of 2, 3 4 5 6 and you're sum and divide by 2.
No ?
C



From: James Davidson [mailto:j.david...@vernalis.com]
Sent: jeudi, 10. juin 2010 14:35
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Number of Aromatic Rings

Dear Greg,

I have been trying figure-out how to return the count of aromatic rings for 
molecules (in Python), and am going to have to admit defeat!  I saw in an 
earlier message 
(http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg00153.html) 
a similar query, but I'm afraid it didn't help me very much.  I also read the 
section on Aromaticity in the rdkit book, and realised that maybe this isn't a 
trivial exercise!

I would like the count to count aromatic ring-systems such that bicyclic (eg 
indole or naphthalene) would only count as 1.  For reference, this appears to 
be the behaviour of the OpenEye OEDetermineAromaticRingSystems function - where 
the molecule derived from the smiles "C(O)(=O)c12c1[nH]c(C3CCCc4c34)c2" 
(which contains an indole and a tetrahydronaphthalene) gives a count of 2.

Any help would be greatly appreciated.

Thanks

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

**
DISCLAIMER
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.
**

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Number of Aromatic Rings

2010-06-10 Thread James Davidson
Thanks for the suggestion, Cedric.
 
I think something like this could work - but it would not get around the
issue of bicyclic aromatics, unless more advanced string parsing were
used.  For example, indole can be represented as [nH]1ccc2c12 in
SMILES, so would give two aromatic rings from your suggestion.  However,
it does occur to me that for bicyclic systems the 'a12' motif would
always be present(?), so maybe...
 
James



From: Cedric MORETTI [mailto:cedric.more...@firmenich.com] 
Sent: 10 June 2010 15:04
To: James Davidson; rdkit-discuss@lists.sourceforge.net
Subject: RE: Number of Aromatic Rings



I propose something; it's not a true approach

If you've a aromatic you've a smile c1c1 for example.

So if you count the number of "1" and divide by 2 you're the number of
aromatic 

If you've lot of cycle in your structure, you're count the number of 1,
the number of 2, 3 4 5 6 and you're sum and divide by 2.

No ?

C

 

 

 

From: James Davidson [mailto:j.david...@vernalis.com] 
Sent: jeudi, 10. juin 2010 14:35
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Number of Aromatic Rings

 

Dear Greg,

 

I have been trying figure-out how to return the count of aromatic rings
for molecules (in Python), and am going to have to admit defeat!  I saw
in an earlier message
(http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg0015
3.html) a similar query, but I'm afraid it didn't help me very much.  I
also read the section on Aromaticity in the rdkit book, and realised
that maybe this isn't a trivial exercise!

 

I would like the count to count aromatic ring-systems such that bicyclic
(eg indole or naphthalene) would only count as 1.  For reference, this
appears to be the behaviour of the OpenEye
OEDetermineAromaticRingSystems function - where the molecule derived
from the smiles "C(O)(=O)c12c1[nH]c(C3CCCc4c34)c2" (which
contains an indole and a tetrahydronaphthalene) gives a count of 2.

 

Any help would be greatly appreciated.

 

Thanks

 

James


__
PLEASE READ: This email is confidential and may be privileged. It is
intended for the named addressee(s) only and access to it by anyone else
is unauthorised. If you are not an addressee, any disclosure or copying
of the contents of this email or any action taken (or not taken) in
reliance on it is unauthorised and may be unlawful. If you have received
this email in error, please notify the sender or
postmas...@vernalis.com. Email is not a secure method of communication
and the Company cannot accept responsibility for the accuracy or
completeness of this message or any attachment(s). Please check this
email for virus infection for which the Company accepts no
responsibility. If verification of this email is sought then please
request a hard copy. Unless otherwise stated, any views or opinions
presented are solely those of the author and do not represent those of
the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to
the Vernalis website at www.vernalis.com and click on the "Company
address and registration details" link at the bottom of the page..
__

**
DISCLAIMER
This email and any files transmitted with it, including replies and
forwarded copies (which may contain alterations) subsequently
transmitted from Firmenich, are confidential and solely for the use of
the intended recipient. The contents do not represent the opinion of
Firmenich except to the extent that it relates to their official
business.
**


__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Gr

Re: [Rdkit-discuss] Number of Aromatic Rings

2010-06-10 Thread Greg Landrum
Dear James,

On Thu, Jun 10, 2010 at 2:35 PM, James Davidson  wrote:
>
> I have been trying figure-out how to return the count of aromatic rings for
> molecules (in Python), and am going to have to admit defeat!  I saw in an
> earlier message
> (http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg00153.html)
> a similar query, but I'm afraid it didn't help me very much.  I also read
> the section on Aromaticity in the rdkit book, and realised that maybe this
> isn't a trivial exercise!

Correct. Counting the number of non-fused rings that are aromatic,
like the post you reference does, is pretty easy; including the fused
rings that are aromatic is more challenging.

> I would like the count to count aromatic ring-systems such that bicyclic (eg
> indole or naphthalene) would only count as 1.  For reference, this appears
> to be the behaviour of the OpenEye OEDetermineAromaticRingSystems function -
> where the molecule derived from the smiles
> "C(O)(=O)c12c1[nH]c(C3CCCc4c34)c2" (which contains an indole and a
> tetrahydronaphthalene) gives a count of 2.
>
> Any help would be greatly appreciated.

I've attached a script that's not quite what you want, but it gets you
almost there: it finds all aromatic ring systems, including fused
ones. Anthracene, for example, gives 6 rings. The modifications to
this to get what you're looking for aren't a straightforward
post-processing step, but shouldn't be too bad. If there's not enough
here, let me know and I will take a look at adding the extra code.

This code isn't perfectly polished and could certainly be faster, but
it does seem mostly functional.

-greg
#
# Copyright (C) 2010 Greg Landrum
#
from rdkit import Chem

def IsRingAromatic(ring,aromaticBonds):
for bidx in ring:
if not aromaticBonds[bidx]:
return False
return True

def HasRingAromatic(ring,aromaticBonds):
for bidx in ring:
if aromaticBonds[bidx]:
return True
return False

def GetFusedRings(rings):
res=rings[:]

pool=[]
for i in range(len(rings)):
for j in range(i+1,len(rings)):
ovl=rings[i]&rings[j]
if ovl:
fused=rings[i]|rings[j]
fused.difference_update(ovl)
pool.append(fused)
while pool:
res.extend(pool)
nextRound=[]
for ringi in rings:
li=len(ringi)
for poolj in pool:
ovl = ringi&poolj
if ovl:
lj = len(poolj)
fused = ringi|poolj
fused.difference_update(ovl)
lf = len(fused)
if lf>li and lf>lj and fused not in nextRound and fused not in res:
nextRound.append(fused)
pool = nextRound
return res

def FindAromaticRings(mol):
# flag whether or not bonds are aromatic:
aromaticBonds = [0]*mol.GetNumBonds()
for bond in mol.GetBonds():
if bond.GetIsAromatic():
aromaticBonds[bond.GetIdx()]=1

# get the list of all rings:
ri = mol.GetRingInfo()
# collect the ones that have at least one aromatic bond:
rings=[set(x) for x in ri.BondRings() if HasRingAromatic(x,aromaticBonds)]

# generate all fused ring systems from that set
fusedRings=GetFusedRings(rings)

aromaticRings = [x for x in fusedRings if IsRingAromatic(x,aromaticBonds)]
return aromaticRings

if __name__=='__main__':
m = Chem.MolFromSmiles('C1=CC2=CC=CC=CC2=C1')
print FindAromaticRings(m)

m = Chem.MolFromSmiles('C1=CC2=C(C=C1)C=C1C=C3C=C4C=C5C(C=CC=C5C5=CC=CC6=C5C=CC=C6)=CC4=CC3=CC1=C2')
print FindAromaticRings(m)
--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Number of Aromatic Rings

2010-06-11 Thread James Davidson
Thanks Greg,

The code looks pretty shiny to me!  I hope I can find time over the weekend to 
look at doing the post-processing, and will let you know how I get on.

Kind regards

James

 

-Original Message-
From: Greg Landrum [mailto:greg.land...@gmail.com] 
Sent: 11 June 2010 06:02
To: James Davidson
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Number of Aromatic Rings

Dear James,

On Thu, Jun 10, 2010 at 2:35 PM, James Davidson  wrote:
>
> I have been trying figure-out how to return the count of aromatic 
> rings for molecules (in Python), and am going to have to admit defeat!  
> I saw in an earlier message
> (http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg00
> 153.html) a similar query, but I'm afraid it didn't help me very much.  
> I also read the section on Aromaticity in the rdkit book, and realised 
> that maybe this isn't a trivial exercise!

Correct. Counting the number of non-fused rings that are aromatic, like the 
post you reference does, is pretty easy; including the fused rings that are 
aromatic is more challenging.

> I would like the count to count aromatic ring-systems such that 
> bicyclic (eg indole or naphthalene) would only count as 1.  For 
> reference, this appears to be the behaviour of the OpenEye 
> OEDetermineAromaticRingSystems function - where the molecule derived 
> from the smiles "C(O)(=O)c12c1[nH]c(C3CCCc4c34)c2" (which 
> contains an indole and a
> tetrahydronaphthalene) gives a count of 2.
>
> Any help would be greatly appreciated.

I've attached a script that's not quite what you want, but it gets you almost 
there: it finds all aromatic ring systems, including fused ones. Anthracene, 
for example, gives 6 rings. The modifications to this to get what you're 
looking for aren't a straightforward post-processing step, but shouldn't be too 
bad. If there's not enough here, let me know and I will take a look at adding 
the extra code.

This code isn't perfectly polished and could certainly be faster, but it does 
seem mostly functional.

-greg

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Number of Aromatic Rings

2010-06-11 Thread James Davidson
Hi Greg

Well, I managed to have a go at this earlier than I expected.  So first some 
apologies, provisos, and caveats to warn you, and other readers, that your eyes 
will soon experience things inelegant and unpythonic, but it's the best I could 
come up with, with my limited faculties and experience!

On the plus side - I think it is doing what I wanted - ie giving a count of the 
number of aromatic systems (if you always want count a fused aromatic as 1 
aromatic system).  The downside is that the way I have done this now makes your 
script eg output (6,1) for anthracene - where the 1 is the count of aromatic 
systems (fused or otherwise).  It would be most generic if it maybe returned 
(6,3,1) as (all unique aromatic substructures, unique mono-cyclic 
substructures, aromatic systems).  I'm sure this is fairly straightforward, but 
for another day!

So what I added was:



def GetOuterSet(rings):
# Initialise a counter for parent aromatic 'super' rings 
result = 0

# Set-up a dictionary so that items can be referenced and deleted
ring_set = {}
for k, v in enumerate(rings):
ring_set[k] = v

# While there is something to process
while len(ring_set):
# Set the ring to be checked as the last in the list - should be the 
biggest
reference = sorted(ring_set)[-1]

for k,v in sorted(ring_set.iteritems()):
# if current item is contained in last item - remove current from 
dictionary
if v&ring_set[reference]:
ring_set.pop(k)
# If we are at the reference, then we have found our 'super' 
ring
if k == reference:
result += 1
break

return result



and I passed in the aromaticRings list from your script, then returned both the 
length of the aromaticRings list (as before) plus the output of GetOuterSet().  
ie:


superRings = GetOuterSet(aromaticRings)

return len(aromaticRings), superRings


So once again, thanks for the help, and I would welcome any pointers from 
anyone on tidying-up and improving this modification!  (or corrections if 
anyone spots them - I have only briefly tested this)


Kind regards

James


-Original Message-
From: Greg Landrum [mailto:greg.land...@gmail.com] 
Sent: 11 June 2010 06:02
To: James Davidson
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Number of Aromatic Rings

Dear James,

On Thu, Jun 10, 2010 at 2:35 PM, James Davidson  wrote:
>
> I have been trying figure-out how to return the count of aromatic 
> rings for molecules (in Python), and am going to have to admit defeat!  
> I saw in an earlier message
> (http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg00
> 153.html) a similar query, but I'm afraid it didn't help me very much.  
> I also read the section on Aromaticity in the rdkit book, and realised 
> that maybe this isn't a trivial exercise!

Correct. Counting the number of non-fused rings that are aromatic, like the 
post you reference does, is pretty easy; including the fused rings that are 
aromatic is more challenging.

> I would like the count to count aromatic ring-systems such that 
> bicyclic (eg indole or naphthalene) would only count as 1.  For 
> reference, this appears to be the behaviour of the OpenEye 
> OEDetermineAromaticRingSystems function - where the molecule derived 
> from the smiles "C(O)(=O)c12c1[nH]c(C3CCCc4c34)c2" (which 
> contains an indole and a
> tetrahydronaphthalene) gives a count of 2.
>
> Any help would be greatly appreciated.

I've attached a script that's not quite what you want, but it gets you almost 
there: it finds all aromatic ring systems, including fused ones. Anthracene, 
for example, gives 6 rings. The modifications to this to get what you're 
looking for aren't a straightforward post-processing step, but shouldn't be too 
bad. If there's not enough here, let me know and I will take a look at adding 
the extra code.

This code isn't perfectly polished and could certainly be faster, but it does 
seem mostly functional.

-greg

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the