[Rdkit-discuss] delete a substructure
Hi everyone, I am new to rdkit but I am already impressed by its vibrant community. I have a question regarding deleting substructure. In the RDKIT documentation, this is a snippet of code describing how to delete substructure: >>>m = Chem.MolFromSmiles("CC(=O)O") >>>patt = Chem.MolFromSmarts("C(=O)[OH]") >>>rm = AllChem.DeleteSubstructs(m, patt) >>>Chem.MolToSmiles(rm) 'C' This block of code first loads a molecule CH3COOH using SMILES code, then defines a substructure COOH using SMARTS code which is to be deleted. After final line of code, the program outputs 'C', in SMILES form. I had wanted to develop a method for detecting number of groups in a molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by using their respective SMARTS code with no problem. However, when molecule becomes more complicated, it is preferred to delete the substructure that has been searched before moving to next search using SMARTS code. Well, in current case, after searching -COOH group and deleting it, the leftover is 'C' which is essentially CH4 instead of --CH3. I cannot proceed with searching with SMARTS code for --CH3 ([CH3;A;X4!R]). Is there any way to work around this? Thanks, Chenyang -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] delete a substructure
Hi Greg, Thanks for a prompt reply. I did try "GetSubstructMatches()" and it returns correct numbers of substructures for CH3COOH. The potential problem with this approach is that if the molecule is getting complicated, it will possibly generate duplicate numbers for certain functional groups. For example, --OH (alcohol) group will be likely also counted in --COOH. A safer way, in my mind, is to remove the substructure that has been counted. Greg, you mentioned "chemical reaction functionality", can you show me a demo script with that using CH3COOH as an example. I will definitely delve into the manual to learn more. But reading your code will be a good start. Thanks, Chenyang On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum wrote: > Hi Chenyang, > > If you're really interested in counting the number of times the > substructure appears, you can do that much quicker with > `GetSubstructMatches()`: > > In [2]: m = Chem.MolFromSmiles('CC(C)CCO') > In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]'))) > Out[3]: 2 > > Is that sufficient, or do you actually want to sequentially remove all of > the groups in your list? > > If you actually want to remove them, you are probably better off using the > chemical reaction functionality instead of DeleteSubstructs(), which > recalculates the number of implicit Hs on atoms after each call. > > -greg > > > On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi wrote: > >> I am new to rdkit but I am already impressed by its vibrant community. I >> have a question regarding deleting substructure. In the RDKIT >> documentation, this is a snippet of code describing how to delete >> substructure: >> >> >>>m = Chem.MolFromSmiles("CC(=O)O") >> >>>patt = Chem.MolFromSmarts("C(=O)[OH]") >> >>>rm = AllChem.DeleteSubstructs(m, patt) >> >>>Chem.MolToSmiles(rm) >> 'C' >> >> This block of code first loads a molecule CH3COOH using SMILES code, then >> defines a substructure COOH using SMARTS code which is to be deleted. After >> final line of code, the program outputs 'C', in SMILES form. >> >> I had wanted to develop a method for detecting number of groups in a >> molecule. In CH3COOH case, I can search number of --CH3 and --COOH group by >> using their respective SMARTS code with no problem. However, when molecule >> becomes more complicated, it is preferred to delete the substructure that >> has been searched before moving to next search using SMARTS code. Well, in >> current case, after searching -COOH group and deleting it, the leftover is >> 'C' which is essentially CH4 instead of --CH3. I cannot proceed with >> searching with SMARTS code for --CH3 ([CH3;A;X4!R]). >> >> Is there any way to work around this? >> Thanks, >> Chenyang >> >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] delete a substructure
Hongbin and Greg, Thank you both for kind suggestions. I will try both approaches and report my progress later. Best, Chenyang On Monday, March 6, 2017, Greg Landrum wrote: > The solution that Hongbin proposes to the double-counting problem is a > good one. Just be sure to sort your substructure queries in the right order > so that the more complex ones come first. > > Another thing you might think about is making your queries more specific. > For example, as you pointed out "[OH]" is very general and matches parts of > carboxylic acids and a number of other functional groups. The RDKit has a > set of fairly well tested (though certainly not perfect) functional group > definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol > definition from there looks like this: > [O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])] > > > -greg > > > On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 > wrote: > >> Hi, Chenyang, >> You don't need to delete the substructure from the molecule. Just >> check whehter the mapped atoms have been matched. For example: >> >> m = Chem.MolFromSmiles('CC(=O)O') >> OH = Chem.MolFromSmarts('[OH]') >> COOH = Chem.MolFromSmarts('C(O)=O') >> >> m.GetSubstructMatches(OH) >> >> ((3,),) >> m.GetSubstructMatchs(COOH) >> >> ((1, 3, 2),) >> >> Since atom "3" has been already matched, it should be ignored. >> So you can create a "set" to record the matched atoms to avoid >> repetitive count. >> >> -- >> Hongbin Yang 杨弘宾 >> >> >> *From:* Chenyang Shi >> >> *Date:* 2017-03-06 14:04 >> *To:* Greg Landrum >> >> *CC:* RDKit Discuss >> >> *Subject:* Re: [Rdkit-discuss] delete a substructure >> Hi Greg, >> >> Thanks for a prompt reply. I did try "GetSubstructMatches()" and it >> returns correct numbers of substructures for CH3COOH. The potential problem >> with this approach is that if the molecule is getting complicated, it will >> possibly generate duplicate numbers for certain functional groups. For >> example, --OH (alcohol) group will be likely also counted in --COOH. A >> safer way, in my mind, is to remove the substructure that has been counted. >> >> Greg, you mentioned "chemical reaction functionality", can you show me a >> demo script with that using CH3COOH as an example. I will definitely delve >> into the manual to learn more. But reading your code will be a good start. >> >> Thanks, >> Chenyang >> >> >> >> On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum > > wrote: >> >>> Hi Chenyang, >>> >>> If you're really interested in counting the number of times the >>> substructure appears, you can do that much quicker with >>> `GetSubstructMatches()`: >>> >>> In [2]: m = Chem.MolFromSmiles('CC(C)CCO') >>> In [3]: len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]'))) >>> Out[3]: 2 >>> >>> Is that sufficient, or do you actually want to sequentially remove all >>> of the groups in your list? >>> >>> If you actually want to remove them, you are probably better off using >>> the chemical reaction functionality instead of DeleteSubstructs(), which >>> recalculates the number of implicit Hs on atoms after each call. >>> >>> -greg >>> >>> >>> On Mon, Mar 6, 2017 at 4:21 AM, Chenyang Shi >> > wrote: >>> >>>> I am new to rdkit but I am already impressed by its vibrant community. >>>> I have a question regarding deleting substructure. In the RDKIT >>>> documentation, this is a snippet of code describing how to delete >>>> substructure: >>>> >>>> >>>m = Chem.MolFromSmiles("CC(=O)O") >>>> >>>patt = Chem.MolFromSmarts("C(=O)[OH]") >>>> >>>rm = AllChem.DeleteSubstructs(m, patt) >>>> >>>Chem.MolToSmiles(rm) >>>> 'C' >>>> >>>> This block of code first loads a molecule CH3COOH using SMILES code, >>>> then defines a substructure COOH using SMARTS code which is to be deleted. >>>> After final line of code, the program outputs 'C', in SMILES form. >>>> >>>> I had wanted to develop a method for detecting number of groups in a >>>> molecule. In CH3COOH case, I can search number of --CH3 and --COOH gro
Re: [Rdkit-discuss] delete a substructure
Thanks Hongbin and Pavel for the suggestions. I am now confident that the approach Hongbin proposed to remove duplicate counts is a robust one. Now I need to revisit/recheck all my SMARTS definitions. One last question I have is do you guys have convenient online or local documents to look up desired SMARTS. Greg mentioned $RDBASE/Data/Functional_Group_Hierarchy.txt, which comes with the installation of RDKIT. Brian suggested daylight website, http://www.daylight.com/dayhtml_tutorials/languages/ smarts/smarts_examples.html, which is a good place as well. Best, Chenyang On Thu, Mar 9, 2017 at 1:09 AM, 杨弘宾 wrote: > Hi Chemyang, > > Your issue was caused by the definition of "-OH(phenol)", I think. If > you define this pattern as "cO", the atom *3* will be matched since it is > the aromatic carbon bond to an oxygen. I guess you just wanted to match > exactly the oxygen and restrict it with "bonding with an aromatic carbon". > So the SMARTS should ber "[$(Oc)]", which indicates an oxygen with the > environment of "bonding with an aromatic carbon". > > m = Chem.MolFromSmiles('CC1=CC(=C(C=C1)C(=O)O)O') > m.GetSubstructMatches(Chem.MolFromSmiles('[$(Oc)]')) > >>> ((10,),) > > Then only atom *10* will be matched and it won't interfere with other > counts. > > Reference: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html > 4.4 > > -- > Hongbin Yang > > > *From:* Chenyang Shi > *Date:* 2017-03-09 01:32 > *To:* Greg Landrum > *CC:* rdkit-discuss ; 杨弘宾 > > *Subject:* Re: [Rdkit-discuss] delete a substructure > Dear Hongbin, > > I tried your method on a molecule, 4-Methylsalicylic acid > (CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in Joback > method (using SMARTS), and used m.GetSubstructMatches to print out all > atom positions. The result is summarized in the table. > > We can see there are duplicated counts--coming from COOH group. As > suggested by Hongbin, we can remove duplicated atoms by looking at their > positions--in this case, ((9),), ((7,8,),), ((7,),), and ((8,),) are > subsets of ((7,8,9)) from -COOH. Indeed we can get rid of these duplicates. > However, I also noticed that Atom (3,) from =C< (ring) group is also a part > of -OH (phenol) ((10,3),). If we apply the same algorithm to remove > duplicates, the =C<(ring) group will be only counted twice instead of three > times. > > Greg, you mentioned as an alternative I can delete substructure using > chemical reaction method. It would be greatly appreciated if you could show > me (point me to) a simple example code, perhaps on a simple molecule? I > find myself at a loss when browsing the manual. I would like to try also in > that direction. > > Thanks, > Chenyang > > > [image: Inline image 1] > > > On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum > wrote: > >> The solution that Hongbin proposes to the double-counting problem is a >> good one. Just be sure to sort your substructure queries in the right order >> so that the more complex ones come first. >> >> Another thing you might think about is making your queries more specific. >> For example, as you pointed out "[OH]" is very general and matches parts of >> carboxylic acids and a number of other functional groups. The RDKit has a >> set of fairly well tested (though certainly not perfect) functional group >> definitions in $RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol >> definition from there looks like this: >> [O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])] >> >> >> -greg >> >> >> On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 wrote: >> >>> Hi, Chenyang, >>> You don't need to delete the substructure from the molecule. Just >>> check whehter the mapped atoms have been matched. For example: >>> >>> m = Chem.MolFromSmiles('CC(=O)O') >>> OH = Chem.MolFromSmarts('[OH]') >>> COOH = Chem.MolFromSmarts('C(O)=O') >>> >>> m.GetSubstructMatches(OH) >>> >> ((3,),) >>> m.GetSubstructMatchs(COOH) >>> >> ((1, 3, 2),) >>> >>> Since atom "3" has been already matched, it should be ignored. >>> So you can create a "set" to record the matched atoms to avoid >>> repetitive count. >>> >>> -- >>> Hongbin Yang 杨弘宾 >>> >>> >>> *From:* Chenyang Shi >>> *Date:* 2017-03-06 14:04 >>> *To:* Greg Landrum >>> *CC:* RDKit Discuss >>> *Subject:* Re: [Rd
Re: [Rdkit-discuss] delete a substructure
Thank you Chris. I found that one too; it is quite convenient to visualize both SMARTS and SMILES strings. On Thu, Mar 9, 2017 at 11:28 AM, Chris Swain wrote: > I use SMARTSviewer at Univ of Hamburg > > http://www.zbh.uni-hamburg.de/en/bioinformatics-server.html > > Chris > > On 9 Mar 2017, at 17:21, rdkit-discuss-requ...@lists.sourceforge.net > wrote: > > One last question I have is do you guys have convenient online or local > documents to look up desired SMARTS. > Greg mentioned $RDBASE/Data/Functional_Group_Hierarchy.txt, which comes > with the installation of RDKIT. > Brian suggested daylight website, > http://www.daylight.com/dayhtml_tutorials/languages/ > smarts/smarts_examples.html, which is a good place as well. > > Best, > Chenyang > > > > > -- > Announcing the Oxford Dictionaries API! The API offers world-renowned > dictionary content that is easy and intuitive to access. Sign up for an > account today to start using our lexical data to power your apps and > projects. Get started today and enter our developer competition. > http://sdm.link/oxford > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Announcing the Oxford Dictionaries API! The API offers world-renowned dictionary content that is easy and intuitive to access. Sign up for an account today to start using our lexical data to power your apps and projects. Get started today and enter our developer competition. http://sdm.link/oxford___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] segmentation fault 11
Hi Greg, Thank you for a quick response; it worked, both for RDKit, and for JRgui program I wrote (that used RDKit). The error message seems a bit odd, but good to know a way to get around it. Best, Chenyang On Fri, Oct 27, 2017 at 10:54 PM, Greg Landrum wrote: > Hi Chenyang, > > This looks like the breakage caused by conda v4.3.27. There's some more > information here: > https://www.mail-archive.com/rdkit-discuss@lists. > sourceforge.net/msg07325.html > > Best, > -greg > > > On Sat, Oct 28, 2017 at 5:27 AM, Chenyang Shi wrote: > >> Hi Everyone, >> >> I am writing to report a possible bug in RDKit on mac. >> I have a program that uses Chem from rdkit. The program works fine in >> Linux and Windows systems. However, I had hard time on macOS. I think I >> might be doing something wrong myself before I did a test on a clean macOS >> system on someone else's mac (10.12.4 Sierra; mine is 10.10.3 Yosemite). On >> Sierra computer, I downloaded anaconda, installed it, and conda install >> rdkit as instructed. After I source activate my-rkdit-env, it failed to >> execute my program jrgui.py. I then import rdkit it worked fine; however, >> if I type from rdkit import Chem, it reports error segmentation fault 11. >> >> All these are screen captured, and attached in the email. I am not sure >> if this is a bug or something else. Do you have a hint what's going bad? >> >> Thanks, >> Chenyang >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS for =C=, #CH, #C-
Dear RDKitters, I have a question regarding SMARTS codes for three simple functional groups, these are =C=, #CH and #C-. I am new to SMARTS/SMILES. I indeed tried to guess their codes. Here are my guesses: =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)] #CH : [CH1;A;X2;!R]#[$(*)] #C- : [CH0;A;X2;!R]#[$(*)] I checked these SMARTS at http://smartsview.zbh.uni-hamburg.de/smartsview/calculate?method=get; they all seem make sense. For example, the webpage prints out following messages: =C=: it says "aliphatic C with 0 further total connections, with 0 further hydrogen, not in a ring". #CH: "aliphatic C with 0 further total connections, with 1 further hydrogen, not in a ring". #C-: "aliphatic C with 1 further total connections, with 0 further hydrogen, not in a ring". However, when I search subgroups using these SMARTS, I had problems. For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]", >>> from rdkit import Chem >>> m = Chem.MolFromSmiles('C=C=O') >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](=[$(*)])=[$(*)]")) ((1, 0, 2),) it prints out atomic positions 1, 0, 2--three positions. But I would expect only one position for the Carbon in the middle. Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]", >>> from rdkit import Chem >>> m = Chem.MolFromSmiles('C#C') >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]")) ((0, 1),) I would expect two separate positions such as (0,), (1,), indicating there are two carbon triple bonds (with an hydrogen). Then if if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]", >>> from rdkit import Chem >>> m = Chem.MolFromSmiles('CC#CC') >>> m.GetSubstructMatches(Chem.MolFromSmarts(" [CH0;A;X2;!R]#[$(*)]")) ((1, 2),) Again, I would expect two separate positions such as (1,), (2,), indicating two carbon triple bonds. I think the problem might be my SMARTS for these three groups are not SPECIFIC. I would appreciate everyone's help on this. Cheers, Chenyang -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS for =C=, #CH, #C-
Dear Andy, Thank you for a quick and thorough email. I find it very instructional, although I need to read it a couple times more to digest it. Cheers, Chenyang On Wed, Nov 8, 2017 at 2:27 PM, Andrew Dalke wrote: > On Nov 8, 2017, at 21:00, Chenyang Shi wrote: > > =C= : [CH0;A;X2;!R](=[$(*)])=[$(*)] > > The recursive SMARTS notation, which is the term inside of the [$(...)], > finds a match for the entire pattern and returns the first atom in that > pattern. > > > For example, if I search "C=C=O" using "[CH0;A;X2;!R](=[$(*)])=[$(*)]", > > >>> from rdkit import Chem > > >>> m = Chem.MolFromSmiles('C=C=O') > > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH0;A;X2;!R](= > [$(*)])=[$(*)]")) > > ((1, 0, 2),) > > > > it prints out atomic positions 1, 0, 2--three positions. But I would > expect only one position for the Carbon in the middle. > > The $(*) finds the pattern, which is a "*" and in this case the terminal > carbons, and returns it. The substructure search returns 3 positions > because the first is [CH0;A;X2;!R], the second is the first atom of "*", > and the third is the first atom of the other "*". > > If you only want the first atom the entire pattern, then put the entire > pattern in a recursive SMARTS, as in: > > [$([CH0;A;X2;!R](=*)=*)] > > >>> pat = Chem.MolFromSmarts("[$([CH0;A;X2;!R](=*)=*)]") > >>> mol = Chem.MolFromSmiles('C=C=O') > >>> mol.GetSubstructMatches(pat) > ((1,),) > > > Similarly, if I search "C#C" using "[CH1;A;X2;!R]#[$(*)]", > > >>> from rdkit import Chem > > >>> m = Chem.MolFromSmiles('C#C') > > >>> m.GetSubstructMatches(Chem.MolFromSmarts("[CH1;A;X2;!R]#[$(*)]")) > > ((0, 1),) > > I would expect two separate positions such as (0,), (1,), indicating > there are two carbon triple bonds (with an hydrogen). > > Since you are only looking for a single atom, try putting the entire > pattern in a recursive SMARTS, as in > > [$([CH1;A;X2;!R]#*)] > > >>> mol = Chem.MolFromSmiles("C#C") > >>> pat = Chem.MolFromSmarts("[$([CH1;A;X2;!R]#*)]") > >>> mol.GetSubstructMatches(pat) > ((0,), (1,)) > > > > Then if if I search "CC#CC" using " [CH0;A;X2;!R]#[$(*)]", > > I believe you want "[$([CH0;A;X2;!R]#*)]" > > Thank you for your clear description of what you expected. > > Cheers, > > Andrew > da...@dalkescientific.com > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS for Joback and Reid method
Hi everyone, I have been recently working on a project that implements Joback method using RDKit (https://en.wikipedia.org/wiki/Joback_method). I believe the core to the success of this project is to make the 41 functional groups correctly represented by SMARTS code. I have compiled my own codes, see attachment. I would appreciate your review of it and let me know if you spot errors. I think building a robust/well-tested SMARTS database (though small in my case) would be helpful to others and other projects. Thank you, Chenyang PS: The ones highlighted red in the document are robust. SMARTS.docx Description: MS-Word 2007 document -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS for Joback and Reid method
Dear Emanuel, Thank you for pointing at SMARTSviewer; it is a good place to check our codes. Cheers, Chenyang On Thu, Nov 9, 2017 at 4:44 AM, Emanuel Ehmki wrote: > Dear Chenyang, > > at http://smartsview.zbh.uni-hamburg.de/ you will find a useful tool to > visualize your Smarts pattern and also get them checked for correctness. > > Best, > Emanuel > > Jason Biggs schrieb am Do., 9. Nov. 2017 um > 00:51 Uhr: > >> Chenyang, >> I haven't looked at your smarts strings yet, but I do have this list of >> SMARTS strings for the joback method I compiled myself (for use here: >> https://www.wolframalpha.com/input/?i=2,3-methano-5,6-dichloroindene&lk=3 >> ). >> >> Perhaps this can be of use. If you spot any mistakes, please let me know >> >> Jason >> >> $JobackSubstructures={ >> >> {"Methyl","-CH3", "[CX4H3]"}, >> >> {"SecondaryAcyclic", "-CH2-", "[!R;CX4H2]"}, >> >> {"TertiaryAcyclic",">CH-", "[!R;CX4H]"}, >> >> {"QuaternaryAcyclic", ">C<", "[!R;CX4H0]"}, >> >> {"PrimaryAlkene", "=CH2", "[CX3H2]"}, >> >> {"SecondaryAlkeneAcyclic", "=CH-", "[!R;CX3H1;!$([CX3H1](=O))]"}, >> >> {"TertiaryAlkeneAcyclic", "=C<", "[$([!R;#6X3H0]);!$([!R;# >> 6X3H0]=[#8])]"}, >> >> {"CumulativeAlkene", "=C=", "[$([CX2H0](=*)=*)]"}, >> >> {"TerminalAlkyne", "\[Congruent]CH","[$([CX2H1]#[!#7])]"}, >> >> {"InternalAlkyne","\[Congruent]C-","[$([CX2H0]#[!#7])]"}, >> >> {"SecondaryCyclic", "-CH2- (ring)", "[R;CX4H2]"}, >> >> {"TertiaryCyclic", ">CH- (ring)", "[R;CX4H]"}, >> >> {"QuaternaryCyclic", ">C< (ring)", "[R;CX4H0]"}, >> >> {"SecondaryAlkeneCyclic", "=CH- (ring)", "[R;CX3H1,cX3H1]"}, >> >> {"TertiaryAlkeneCyclic", "=C< (ring)","[$([R;#6X3H0]);!$([R; >> #6X3H0]=[#8])]"}, >> >> {"Fluoro", "-F", "[F]"}, >> >> {"Chloro", "-Cl", "[Cl]"}, >> >> {"Bromo", "-Br", "[Br]"}, >> >> {"Iodo", "-I", "[I]"}, >> >> {"Alcohol","-OH", "[OX2H;!$([OX2H]-[#6]=[O]);!$([OX2H]-a)]"},(* alcohol >> - not matching a carboxylic acid *) >> >> {"Phenol","-OH", "[$([OX2H]-a)]"}, >> >> {"EtherAcyclic", "-O-", "[OX2H0;!R;!$([OX2H0]-[#6]=[#8])]"}, >> >> {"EtherCyclic", "-O- (ring)", "[#8X2H0;R;!$([#8X2H0]~[#6]=[#8])]"}, >> >> {"CarbonylAcyclic", ">C=O", "[$([CX3H0](=[OX1]));!$([CX3]( >> =[OX1])-[OX2]);!R]=O"}, >> >> {"CarbonylCyclic", ">C=O (ring)","[$([#6X3H0](=[OX1])); >> !$([#6X3](=[#8X1])~[#8X2]);R]=O"}, >> >> {"Aldehyde","O=CH-","[CX3H1](=O)"}, >> >> {"CarboxylicAcid", "COOH", "[OX2H]-[C]=O"}, >> >> {"Ester", "-C(=O)O-", "[#6X3H0;!$([#6X3H0](~O)(~O)(~ >> O))](=[#8X1])[#8X2H0]"}, >> >> {"OxygenDoubleBondOther", "=O", "[OX1H0;!$([OX1H0]~[#6X3]);!$( >> [OX1H0]~[#7X3]~[#8])]"}, >> >> {"PrimaryAmino","NH2", "[NX3H2]"}, >> >> {"SecondaryAminoAcyclic",">NH", "[NX3H1;!R]"}, >> >> {"SecondaryAminoCyclic",">NH (ring)", "[#7X3H1;R]"}, >> >> {"TertiaryAmino", ">N-","[#7X3H0;!$([#7](~O)~O)]"}, (* Tertiary amine >> except nitro group *) >> >> {"ImineCyclic","=N- (ring)","[#7X2H0;R]"}, >> >> {"ImineAcyclic","=N-","[#7X2H0;!R]"}, >> >> {"Aldimine", "=NH", "[#7X2H1]"}, >> >> {"Cyano", "-C\[Congruent]N","[#6X2]#[#7X1H0]"}, >> >> {"Nitro", "NO2", "[$([#7X3,#7X3+][!#8])](=[O])~[O-]"}, >> >> {"Thiol", "-SH", "[SX2H]"}, >> >> {"ThioetherAcyc
[Rdkit-discuss] convert a smiles file to a xyz file
Hi Everyone, I am seeking helps about how to convert a SMILES file to a series of coordinates for the molecule, in the format of xyz. I saw some online service that can do the job (e.g. http://www.cheminfo.org/Chemistry/Cheminformatics/FormatConverter/index.html), but it is not convenient to use. I am wondering how can we do this by writing RDKit code. A separate question is that is the converted molecular structure from SMILES the same as that taken from a crystal structure? Many thanks! Chenyang -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] convert a smiles file to a xyz file
Thank you all. It seems OpenBabel supports this. Here is a tutorial I found from Dr. Kulik's group that might be useful http://hjkgrp.mit.edu/content/geometries-strings-smiles-and-openbabel. On Wed, May 23, 2018 at 10:59 AM, Benjamin Bucior < bbuc...@u.northwestern.edu> wrote: > I'm not sure which flags the online tool uses, but it's based on Open > Babel so you might have some success with that tool. > http://open-babel.readthedocs.io/en/latest/3DStructureGen/Overview.html > > For a quick guess at the structure, an example with the command line tool > is something like > obabel -:"[O-]C(=O)c1ccc(cc1)C(=O)[O-]" --gen3D -O structure.xyz > > If your workflow is in Python, there are some make3D and addh (for > hydrogens) convenience functions in the openbabel (or its pybel) package. > > As Dima mentioned, there's several challenges/nonuniqueness in going from > SMILES to 3D. Some of the conformer search tools in the link can help a > little bit, but in general it's a tricky problem. > > Best, > Ben > > On Wed, May 23, 2018 at 10:30 AM, Dimitri Maziuk via Rdkit-discuss < > rdkit-discuss@lists.sourceforge.net> wrote: > >> On 5/23/2018 10:23 AM, Chenyang Shi wrote: >> >> A separate question is that is the converted molecular structure from >>> SMILES the same as that taken from a crystal structure? >>> >> >> Provided there's no undefined/different stereochemistry on SMILES side, >> no quirks with added protons, and so on and so forth... for a small simple >> molecule... maybe. >> >> Dima >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] convert a smiles file to a xyz file
Thank you Prof. Jensen. I will study the module. Best, Chenyang On Thu, May 24, 2018 at 1:30 AM, Jan Halborg Jensen wrote: > Have a look at write_xtb_input_file in this module: https://github.com/ > jensengroup/take_elementary_step/blob/master/write_input_files.py > > The xtb input is simple an xyz file with some additional lines below if > the molecule is charged. You can simply those lines in the code. > > Best regards, Jan > > On 23 May 2018, at 17:23, Chenyang Shi wrote: > > Hi Everyone, > > I am seeking helps about how to convert a SMILES file to a series of > coordinates for the molecule, in the format of xyz. > I saw some online service that can do the job (e.g. > http://www.cheminfo.org/Chemistry/Cheminformatics/ > FormatConverter/index.html), but it is not convenient to use. > > I am wondering how can we do this by writing RDKit code. A separate > question is that is the converted molecular structure from SMILES the same > as that taken from a crystal structure? > > Many thanks! > Chenyang > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org <http://slashdot.org>! > http://sdm.link/slashdot___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GETAWAY descriptors
Hi everyone, I hope to calculate the R3m descriptor which is among a family of GETAWAY descriptors originally proposed by this paper https://pubs.acs.org/doi/pdf/10.1021/ci015504a . I tried writing some code as below to calculate it but obtained with a Python list of a length of 273. I also tested with other molecules, always ending up with a list of length 273. I am not sure what each of 273 numbers correspond to, and in particular, which one is the R3m descriptor. Can anyone help me understand it? Thank you. Chenyang from rdkit import Chem from rdkit.Chem import AllChem def return_getaway_descriptors(smiles): mol = Chem.MolFromSmiles(smiles) mol_hydrogen = Chem.AddHs(mol) AllChem.EmbedMolecule(mol_hydrogen, randomSeed = 1234) res = Chem.rdMolDescriptors.CalcGETAWAY(mol_hydrogen) return res, len(res) chlorobenzene = 'C1=CC=C(C=C1)Cl' print (return_getaway_descriptors(chlorobenzene)) ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] [*External*] GETAWAY descriptors
Thank you Guillaume, this is very helpful. On Sun, Apr 12, 2020 at 3:11 AM Guillaume GODIN < guillaume.go...@firmenich.com> wrote: > Hello, > > > > /* > > std::vector > > > GETAWAYNAMES={"ITH","ISH","HIC","HGM","H0u","H1u","H2u","H3u","H4u","H5u","H6u","H7u","H8u","HTu", > > > "HATS0u","HATS1u","HATS2u","HATS3u","HATS4u","HATS5u","HATS6u","HATS7u","HATS8u","HATSu","H0m","H1m","H2m","H3m","H4m","H5m", > > > "H6m","H7m","H8m","HTm","HATS0m","HATS1m","HATS2m","HATS3m","HATS4m","HATS5m","HATS6m","HATS7m","HATS8m","HATSm","H0v","H1v", > > > "H2v","H3v","H4v","H5v","H6v","H7v","H8v","HTv","HATS0v","HATS1v","HATS2v","HATS3v","HATS4v","HATS5v","HATS6v","HATS7v","HATS8v", > > > "HATSv","H0e","H1e","H2e","H3e","H4e","H5e","H6e","H7e","H8e","HTe","HATS0e","HATS1e","HATS2e","HATS3e","HATS4e","HATS5e","HATS6e", > > > "HATS7e","HATS8e","HATSe","H0p","H1p","H2p","H3p","H4p","H5p","H6p","H7p","H8p","HTp","HATS0p","HATS1p","HATS2p","HATS3p","HATS4p", > > > "HATS5p","HATS6p","HATS7p","HATS8p","HATSp","H0i","H1i","H2i","H3i","H4i","H5i","H6i","H7i","H8i","HTi","HATS0i","HATS1i","HATS2i", > > > "HATS3i","HATS4i","HATS5i","HATS6i","HATS7i","HATS8i","HATSi","H0s","H1s","H2s","H3s","H4s","H5s","H6s","H7s","H8s","HTs","HATS0s", > > > "HATS1s","HATS2s","HATS3s","HATS4s","HATS5s","HATS6s","HATS7s","HATS8s","HATSs","RCON","RARS","REIG","R1u","R2u","R3u","R4u","R5u", > > > "R6u","R7u","R8u","RTu","R1u+","R2u+","R3u+","R4u+","R5u+","R6u+","R7u+","R8u+","RTu+","R1m","R2m","R3m","R4m","R5m","R6m","R7m", > > > "R8m","RTm","R1m+","R2m+","R3m+","R4m+","R5m+","R6m+","R7m+","R8m+","RTm+","R1v","R2v","R3v","R4v","R5v","R6v","R7v","R8v","RTv", > > > "R1v+","R2v+","R3v+","R4v+","R5v+","R6v+","R7v+","R8v+","RTv+","R1e","R2e","R3e","R4e","R5e","R6e","R7e","R8e","RTe","R1e+","R2e+", > > > "R3e+","R4e+","R5e+","R6e+","R7e+","R8e+","RTe+","R1p","R2p","R3p","R4p","R5p","R6p","R7p","R8p","RTp","R1p+","R2p+","R3p+","R4p+", > > > "R5p+","R6p+","R7p+","R8p+","RTp+","R1i","R2i","R3i","R4i","R5i","R6i","R7i","R8i","RTi","R1i+","R2i+","R3i+","R4i+","R5i+","R6i+", > > > "R7i+","R8i+","RTi+","R1s","R2s","R3s","R4s","R5s","R6s","R7s","R8s","RTs","R1s+","R2s+","R3s+","R4s+","R5s+","R6s+","R7s+","R8s+","RTs+"}; > > */ > > > > > > This is the list of descriptors name in GETAWAY. > > > > Best regards, > > > guillaume > > > > *De : *Chenyang Shi > *Date : *samedi, 11 avril 2020 à 23:32 > *À : *RDKit Discuss > *Objet : *[*External*] [Rdkit-discuss] GETAWAY descriptors > > > > Hi everyone, > > > > I hope to calculate the R3m descriptor which is among a family of GETAWAY > descriptors originally proposed by this paper > https://pubs.acs.org/doi/pdf/10.1021/ci015504a . > > > > I tried writing some code as below to calculate it but obtained with a > Python list of a length of 273. I also tested with other molecules, always > ending up with a list of length 273. I am not sure what each of 273 numbers > correspond to, and in particular, which one is the R3m descriptor. Can > anyone help me understand it? Thank you. > > > > Chenyang > > > > from rdkit import Chem > from rdkit.Chem import AllChem > > def return_getaway_descriptors(smiles): > mol = Chem.MolFromSmiles(smiles) > mol_hydrogen = Chem.AddHs(mol) > AllChem.EmbedMolecule(mol_hydrogen, randomSeed = 1234) > res = Chem.rdMolDescriptors.CalcGETAWAY(mol_hydrogen) > return res, len(res) > chlorobenzene = 'C1=CC=C(C=C1)Cl' > > print (return_getaway_descriptors(chlorobenzene)) > > > > *** > DISCLAIMER > This email and any files transmitted with it, including replies and > forwarded copies (which may contain alterations) subsequently transmitted > from Firmenich, are confidential and solely for the use of the intended > recipient. The contents do not represent the opinion of Firmenich except to > the extent that it relates to their official business. > > *** > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss