网易邮箱
Hi Chemyang,
Your issue was caused by the definition of "-OH(phenol)", I think. If you
define this pattern as "cO", the atom 3 will be matched since it is the
aromatic carbon bond to an oxygen. I guess you just wanted to match exactly
the oxygen and restrict it with "bonding with an aromatic carbon". So the
SMARTS should ber "[$(Oc)]", which indicates an oxygen with the environment of
"bonding with an aromatic carbon".
m = Chem.MolFromSmiles('CC1=CC(=C(C=C1)C(=O)O)O')
m.GetSubstructMatches(Chem.MolFromSmiles('[$(Oc)]')) >>> ((10,),)
Then only atom 10 will be matched and it won't interfere with other counts.
Reference: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html 4.4
Hongbin Yang
From: Chenyang ShiDate: 2017-03-09 01:32To: Greg LandrumCC: rdkit-discuss;
杨弘宾Subject: Re: [Rdkit-discuss] delete a substructure
网易邮箱
Dear Hongbin,
I tried your method on a molecule, 4-Methylsalicylic acid
(CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in Joback method
(using SMARTS), and used m.GetSubstructMatches to print out all atom positions.
The result is summarized in the table.
We can see there are duplicated counts--coming from COOH group. As suggested by
Hongbin, we can remove duplicated atoms by looking at their positions--in this
case, ((9),), ((7,8,),), ((7,),), and ((8,),) are subsets of ((7,8,9)) from
-COOH. Indeed we can get rid of these duplicates. However, I also noticed that
Atom (3,) from =C< (ring) group is also a part of -OH (phenol) ((10,3),). If we
apply the same algorithm to remove duplicates, the =C<(ring) group will be only
counted twice instead of three times.
Greg, you mentioned as an alternative I can delete substructure using chemical
reaction method. It would be greatly appreciated if you could show me (point me
to) a simple example code, perhaps on a simple molecule? I find myself at a
loss when browsing the manual. I would like to try also in that direction.
Thanks,Chenyang
On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum <greg.land...@gmail.com> wrote:
The solution that Hongbin proposes to the double-counting problem is a good
one. Just be sure to sort your substructure queries in the right order so that
the more complex ones come first.
Another thing you might think about is making your queries more specific. For
example, as you pointed out "[OH]" is very general and matches parts of
carboxylic acids and a number of other functional groups. The RDKit has a set
of fairly well tested (though certainly not perfect) functional group
definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol
definition from there looks like this:[O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]
-greg
On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 <yanyangh...@163.com> wrote:
Hi, Chenyang, You don't need to delete the substructure from the molecule.
Just check whehter the mapped atoms have been matched. For example:
m = Chem.MolFromSmiles('CC(=O)O')OH = Chem.MolFromSmarts('[OH]')COOH =
Chem.MolFromSmarts('C(O)=O')
m.GetSubstructMatches(OH)>> ((3,),)m.GetSubstructMatchs(COOH)>> ((1, 3, 2),)
Since atom "3" has been already matched, it should be ignored. So you can
create a "set" to record the matched atoms to avoid repetitive count.
Hongbin Yang 杨弘宾
From: Chenyang ShiDate: 2017-03-06 14:04To: Greg LandrumCC: RDKit
DiscussSubject: Re: [Rdkit-discuss] delete a substructureHi Greg,
Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns
correct numbers of substructures for CH3COOH. The potential problem with this
approach is that if the molecule is getting complicated, it will possibly
generate duplicate numbers for certain functional groups. For example, --OH
(alcohol) group will be likely also counted in --COOH. A safer way, in my mind,
is to remove the substructure that has been counted.
Greg, you mentioned "chemical reaction functionality", can you show me a demo
script with that using CH3COOH as an example. I will definitely delve into the
manual to learn more. But reading your code will be a good start.
Thanks,Chenyang
On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum <greg.land...@gmail.com> wrote:
Hi Chenyang,
If you're really interested in counting the number of times the substructure
appears, you can do that much quicker with `GetSubstructMatches()`:
In [2]: m = Chem.MolFromSmiles('CC(C)CCO')In [3]:
len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
Out[3]: 2
Is that sufficient, or do you actually want to sequentially remove all of the
groups in your list?
If you actually want to remove them, you are probably better off using the
chemical reaction functionality instead of DeleteSubstructs(), which
recalculates the number of implicit Hs on atoms after each call.
-greg
On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi <cs3...@columbia.edu> wrote:
I am new to rdkit but I am already impressed by its vibrant community. I have a
question regarding deleting substructure. In the RDKIT documentation, this is a
snippet of code describing how to delete substructure:
>>>m = Chem.MolFromSmiles("CC(=O)O")>>>patt =
>>>Chem.MolFromSmarts("C(=O)[OH]")>>>rm = AllChem.DeleteSubstructs(m,
>>>patt)>>>Chem.MolToSmiles(rm)'C'
This block of code first loads a molecule CH3COOH using SMILES code, then
defines a substructure COOH using SMARTS code which is to be deleted. After
final line of code, the program outputs 'C', in SMILES form.
I had wanted to develop a method for detecting number of groups in a molecule.
In CH3COOH case, I can search number of --CH3 and --COOH group by using their
respective SMARTS code with no problem. However, when molecule becomes more
complicated, it is preferred to delete the substructure that has been searched
before moving to next search using SMARTS code. Well, in current case, after
searching -COOH group and deleting it, the leftover is 'C' which is essentially
CH4 instead of --CH3. I cannot proceed with searching with SMARTS code for
--CH3 ([CH3;A;X4!R]).
Is there any way to work around this?Thanks,Chenyang
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
邮件带有附件预览链接,若您转发或回复此邮件时不希望对方预览附件,建议您手动删除链接。
共有 1 个附件
4-Methylsalicyclic acid.png(54K)
极速下载
在线预览
------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss