Hongbin and Greg,
Thank you both for kind suggestions. I will try both approaches and report
my progress later.
Best,
Chenyang

On Monday, March 6, 2017, Greg Landrum <greg.land...@gmail.com> wrote:

> The solution that Hongbin proposes to the double-counting problem is a
> good one. Just be sure to sort your substructure queries in the right order
> so that the more complex ones come first.
>
> Another thing you might think about is making your queries more specific.
> For example, as you pointed out "[OH]" is very general and matches parts of
> carboxylic acids and a number of other functional groups. The RDKit has a
> set of fairly well tested (though certainly not perfect) functional group
> definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol
> definition from there looks like this:
> [O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]
>
>
> -greg
>
>
> On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 <yanyangh...@163.com
> <javascript:_e(%7B%7D,'cvml','yanyangh...@163.com');>> wrote:
>
>> Hi, Chenyang,
>>     You don't need to delete the substructure from the molecule. Just
>> check whehter the mapped atoms have been matched. For example:
>>
>> m = Chem.MolFromSmiles('CC(=O)O')
>> OH = Chem.MolFromSmarts('[OH]')
>> COOH = Chem.MolFromSmarts('C(O)=O')
>>
>> m.GetSubstructMatches(OH)
>> >> ((3,),)
>> m.GetSubstructMatchs(COOH)
>> >> ((1, 3, 2),)
>>
>> Since atom "3" has been already matched, it should be ignored.
>> So you can create a "set" to record the matched atoms to avoid
>> repetitive count.
>>
>> ------------------------------
>> Hongbin Yang 杨弘宾
>>
>>
>> *From:* Chenyang Shi
>> <javascript:_e(%7B%7D,'cvml','cs3...@columbia.edu');>
>> *Date:* 2017-03-06 14:04
>> *To:* Greg Landrum
>> <javascript:_e(%7B%7D,'cvml','greg.land...@gmail.com');>
>> *CC:* RDKit Discuss
>> <javascript:_e(%7B%7D,'cvml','rdkit-discuss@lists.sourceforge.net');>
>> *Subject:* Re: [Rdkit-discuss] delete a substructure
>> Hi Greg,
>>
>> Thanks for a prompt reply. I did try "GetSubstructMatches()" and it
>> returns correct numbers of substructures for CH3COOH. The potential problem
>> with this approach is that if the molecule is getting complicated, it will
>> possibly generate duplicate numbers for certain functional groups. For
>> example, --OH (alcohol) group will be likely also counted in --COOH. A
>> safer way, in my mind, is to remove the substructure that has been counted.
>>
>> Greg, you mentioned "chemical reaction functionality", can you show me a
>> demo script with that using CH3COOH as an example. I will definitely delve
>> into the manual to learn more. But reading your code will be a good start.
>>
>> Thanks,
>> Chenyang
>>
>>
>>
>> On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum <greg.land...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','greg.land...@gmail.com');>> wrote:
>>
>>> Hi Chenyang,
>>>
>>> If you're really interested in counting the number of times the
>>> substructure appears, you can do that much quicker with
>>> `GetSubstructMatches()`:
>>>
>>> In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
>>> In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
>>> Out[3]: 2
>>>
>>> Is that sufficient, or do you actually want to sequentially remove all
>>> of the groups in your list?
>>>
>>> If you actually want to remove them, you are probably better off using
>>> the chemical reaction functionality instead of DeleteSubstructs(), which
>>> recalculates the number of implicit Hs on atoms after each call.
>>>
>>> -greg
>>>
>>>
>>> On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi <cs3...@columbia.edu
>>> <javascript:_e(%7B%7D,'cvml','cs3...@columbia.edu');>> wrote:
>>>
>>>> I am new to rdkit but I am already impressed by its vibrant community.
>>>> I have a question regarding deleting substructure. In the RDKIT
>>>> documentation, this is a snippet of code describing how to delete
>>>> substructure:
>>>>
>>>> >>>m = Chem.MolFromSmiles("CC(=O)O")
>>>> >>>patt = Chem.MolFromSmarts("C(=O)[OH]")
>>>> >>>rm = AllChem.DeleteSubstructs(m, patt)
>>>> >>>Chem.MolToSmiles(rm)
>>>> 'C'
>>>>
>>>> This block of code first loads a molecule CH3COOH using SMILES code,
>>>> then defines a substructure COOH using SMARTS code which is to be deleted.
>>>> After final line of code, the program outputs 'C', in SMILES form.
>>>>
>>>> I had wanted to develop a method for detecting number of groups in a
>>>> molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by
>>>> using their respective SMARTS code with no problem. However, when molecule
>>>> becomes more complicated, it is preferred to delete the substructure that
>>>> has been searched before moving to next search using SMARTS code. Well, in
>>>> current case, after searching -COOH group and deleting it, the leftover is
>>>> 'C' which is essentially CH4 instead of --CH3. I cannot proceed with
>>>> searching with SMARTS code for --CH3 ([CH3;A;X4!R]).
>>>>
>>>> Is there any way to work around this?
>>>> Thanks,
>>>> Chenyang
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> <javascript:_e(%7B%7D,'cvml','Rdkit-discuss@lists.sourceforge.net');>
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> <javascript:_e(%7B%7D,'cvml','Rdkit-discuss@lists.sourceforge.net');>
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to