Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor
Thank you very much! This is really helpful! Ali On Wed, Aug 29, 2018 at 7:52 AM Richard Cooper < richardiancooper+rdkitdisc...@gmail.com> wrote: > I think it depends on what you need the descriptor for. If it were for > some kind of fingerprinting, the example implementation would be too noisy. > We used it to estimate how many low energy conformations of a molecule > might be present in a particular system - and it turned out that correlated > well with our classifications of the system. The variability increases > with RBC: for totally rigid systems RBC and nConf20 are zero. For more > reproducible results you can increase the number of conformers generated; > the cost is longer calculations, but if you only have 350 molecules this > might be OK. > > In the paper there are two example molecules with RBC of 1 and 8 > respectively which both have only a single low energy conformation, and it > was this discrimination beyond simple RBC that drove its development. > > Analysis of the spread of nConf20 showed that it was larger than the > spread of RBC, which might give it slightly better properties as an input > descriptor. However, if you are finding less variability in your particular > data set, then it might not be such a good discriminator of whatever you're > trying to discriminate. I wouldn't recommend adopting it as the 'main > descriptor' until you test whether it's useful. > > Regards, > Richard > > > > > On Wed, Aug 29, 2018 at 3:24 PM Ali Eftekhari > wrote: > >> Hi Dr. Cooper, >> >> Thanks for your response and the suggestions. I added randomSeed=737 and >> I now get value of 14 for descriptor nConf20 for ZINC000290539224 molecule >> (although it is different than your paper [the value is 10] it does not >> change on each run). My concern now is on the general usage of nConf20 >> descriptor. For instance, is there a limitation on what molecules can be >> used for estimating their nConf20? Since the conformers are generated >> randomly, how reliable is this descriptor to use it as a replacement for >> Rotatable Bond Count (RBC) in all machine learning models. >> >> In my application, the calculated values of RBC for 350 molecules range >> from 0 to 7 with (80% between 0-4 and 20% between 5-7). The calculated >> values of nconf20 is between 0-40 but with 95% between 0-3. Since nConf20 >> for majority of molecules is between 0-3, I am concerned on the usage of >> nconf20 as the main descriptor. Could you please comment on that? >> >> Thanks, >> Ali >> >> On Wed, Aug 29, 2018 at 6:32 AM Richard Cooper < >> richardiancooper+rdkitdisc...@gmail.com> wrote: >> >>> >>> Just to follow up with the details - here is the line in the script to >>> change: >>> >>>conformers = AllChem.EmbedMultipleConfs >>> (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3) >>> >>> to >>> >>>conformers = AllChem.EmbedMultipleConfs >>> (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3, randomSeed=737 ) >>> >>> (where 737 is an integer constant of your choice, but not -1). >>> >>> Richard >>> >>> >>> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper < >>> richardiancooper+rdkitdisc...@gmail.com> wrote: >>> > >>> > Hi Ali, >>> > >>> > Sorry I missed your email. >>> > >>> > The behaviour you describe is correct, due to a random seed in the >>> conformer generation step. The descriptor value usually doesn't vary by too >>> much. >>> > >>> > I think you can give the conformer generation a constant random seed >>> if you need a reproducible number for nConf20. >>> > >>> > Regards, Richard >>> > >>> > >>> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, >>> wrote: >>> >> >>> >> Hello all, >>> >> >>> >> I am trying to calculate 3D Descriptors following this publication: >>> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational >>> Flexibility in a Single Descriptor", Jerome G. P. Wicker and Richard I. >>> Cooper. J. Chem. Inf. Model. 2016, 56, 2347−2352 >>> >> >>> >> I am essentially using the same script as they have in the supporting >>> information and i have attached it here as well. In Table 2 from the above >>> calculation, the value of the descriptor (nConf20) for ZINC000290539224 >>> molecule is listed as 10. However, when I run the exact code as the one >>> they used, I get different value at each run. >>> >> >>> >> I have already contacted the authors but got no response. I am >>> wondering if the code they have in the supporting information is not right >>> or the value they listed in the table is wrong? >>> >> >>> >> The SMILES string for this particular molecule is: >>> >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O' >>> >> >>> >> Thanks in advance for your help! >>> >> >>> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net
Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor
I think it depends on what you need the descriptor for. If it were for some kind of fingerprinting, the example implementation would be too noisy. We used it to estimate how many low energy conformations of a molecule might be present in a particular system - and it turned out that correlated well with our classifications of the system. The variability increases with RBC: for totally rigid systems RBC and nConf20 are zero. For more reproducible results you can increase the number of conformers generated; the cost is longer calculations, but if you only have 350 molecules this might be OK. In the paper there are two example molecules with RBC of 1 and 8 respectively which both have only a single low energy conformation, and it was this discrimination beyond simple RBC that drove its development. Analysis of the spread of nConf20 showed that it was larger than the spread of RBC, which might give it slightly better properties as an input descriptor. However, if you are finding less variability in your particular data set, then it might not be such a good discriminator of whatever you're trying to discriminate. I wouldn't recommend adopting it as the 'main descriptor' until you test whether it's useful. Regards, Richard On Wed, Aug 29, 2018 at 3:24 PM Ali Eftekhari wrote: > Hi Dr. Cooper, > > Thanks for your response and the suggestions. I added randomSeed=737 and > I now get value of 14 for descriptor nConf20 for ZINC000290539224 molecule > (although it is different than your paper [the value is 10] it does not > change on each run). My concern now is on the general usage of nConf20 > descriptor. For instance, is there a limitation on what molecules can be > used for estimating their nConf20? Since the conformers are generated > randomly, how reliable is this descriptor to use it as a replacement for > Rotatable Bond Count (RBC) in all machine learning models. > > In my application, the calculated values of RBC for 350 molecules range > from 0 to 7 with (80% between 0-4 and 20% between 5-7). The calculated > values of nconf20 is between 0-40 but with 95% between 0-3. Since nConf20 > for majority of molecules is between 0-3, I am concerned on the usage of > nconf20 as the main descriptor. Could you please comment on that? > > Thanks, > Ali > > On Wed, Aug 29, 2018 at 6:32 AM Richard Cooper < > richardiancooper+rdkitdisc...@gmail.com> wrote: > >> >> Just to follow up with the details - here is the line in the script to >> change: >> >>conformers = AllChem.EmbedMultipleConfs >> (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3) >> >> to >> >>conformers = AllChem.EmbedMultipleConfs >> (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3, randomSeed=737 ) >> >> (where 737 is an integer constant of your choice, but not -1). >> >> Richard >> >> >> On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper < >> richardiancooper+rdkitdisc...@gmail.com> wrote: >> > >> > Hi Ali, >> > >> > Sorry I missed your email. >> > >> > The behaviour you describe is correct, due to a random seed in the >> conformer generation step. The descriptor value usually doesn't vary by too >> much. >> > >> > I think you can give the conformer generation a constant random seed if >> you need a reproducible number for nConf20. >> > >> > Regards, Richard >> > >> > >> > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, >> wrote: >> >> >> >> Hello all, >> >> >> >> I am trying to calculate 3D Descriptors following this publication: >> >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility >> in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper. J. >> Chem. Inf. Model. 2016, 56, 2347−2352 >> >> >> >> I am essentially using the same script as they have in the supporting >> information and i have attached it here as well. In Table 2 from the above >> calculation, the value of the descriptor (nConf20) for ZINC000290539224 >> molecule is listed as 10. However, when I run the exact code as the one >> they used, I get different value at each run. >> >> >> >> I have already contacted the authors but got no response. I am >> wondering if the code they have in the supporting information is not right >> or the value they listed in the table is wrong? >> >> >> >> The SMILES string for this particular molecule is: >> >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O' >> >> >> >> Thanks in advance for your help! >> >> >> >>> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor
Hi Dr. Cooper, Thanks for your response and the suggestions. I added randomSeed=737 and I now get value of 14 for descriptor nConf20 for ZINC000290539224 molecule (although it is different than your paper [the value is 10] it does not change on each run). My concern now is on the general usage of nConf20 descriptor. For instance, is there a limitation on what molecules can be used for estimating their nConf20? Since the conformers are generated randomly, how reliable is this descriptor to use it as a replacement for Rotatable Bond Count (RBC) in all machine learning models. In my application, the calculated values of RBC for 350 molecules range from 0 to 7 with (80% between 0-4 and 20% between 5-7). The calculated values of nconf20 is between 0-40 but with 95% between 0-3. Since nConf20 for majority of molecules is between 0-3, I am concerned on the usage of nconf20 as the main descriptor. Could you please comment on that? Thanks, Ali On Wed, Aug 29, 2018 at 6:32 AM Richard Cooper < richardiancooper+rdkitdisc...@gmail.com> wrote: > > Just to follow up with the details - here is the line in the script to > change: > >conformers = AllChem.EmbedMultipleConfs > (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3) > > to > >conformers = AllChem.EmbedMultipleConfs > (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3, randomSeed=737 ) > > (where 737 is an integer constant of your choice, but not -1). > > Richard > > > On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper < > richardiancooper+rdkitdisc...@gmail.com> wrote: > > > > Hi Ali, > > > > Sorry I missed your email. > > > > The behaviour you describe is correct, due to a random seed in the > conformer generation step. The descriptor value usually doesn't vary by too > much. > > > > I think you can give the conformer generation a constant random seed if > you need a reproducible number for nConf20. > > > > Regards, Richard > > > > > > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, > wrote: > >> > >> Hello all, > >> > >> I am trying to calculate 3D Descriptors following this publication: > >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility > in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper. J. > Chem. Inf. Model. 2016, 56, 2347−2352 > >> > >> I am essentially using the same script as they have in the supporting > information and i have attached it here as well. In Table 2 from the above > calculation, the value of the descriptor (nConf20) for ZINC000290539224 > molecule is listed as 10. However, when I run the exact code as the one > they used, I get different value at each run. > >> > >> I have already contacted the authors but got no response. I am > wondering if the code they have in the supporting information is not right > or the value they listed in the table is wrong? > >> > >> The SMILES string for this particular molecule is: > >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O' > >> > >> Thanks in advance for your help! > >> > >> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor
Just to follow up with the details - here is the line in the script to change: conformers = AllChem.EmbedMultipleConfs (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3) to conformers = AllChem.EmbedMultipleConfs (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3, randomSeed=737 ) (where 737 is an integer constant of your choice, but not -1). Richard On Tue, Aug 28, 2018 at 12:55 PM Richard Cooper < richardiancooper+rdkitdisc...@gmail.com> wrote: > > Hi Ali, > > Sorry I missed your email. > > The behaviour you describe is correct, due to a random seed in the conformer generation step. The descriptor value usually doesn't vary by too much. > > I think you can give the conformer generation a constant random seed if you need a reproducible number for nConf20. > > Regards, Richard > > > On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, wrote: >> >> Hello all, >> >> I am trying to calculate 3D Descriptors following this publication: >> "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor", Jerome G. P. Wicker and Richard I. Cooper. J. Chem. Inf. Model. 2016, 56, 2347−2352 >> >> I am essentially using the same script as they have in the supporting information and i have attached it here as well. In Table 2 from the above calculation, the value of the descriptor (nConf20) for ZINC000290539224 molecule is listed as 10. However, when I run the exact code as the one they used, I get different value at each run. >> >> I have already contacted the authors but got no response. I am wondering if the code they have in the supporting information is not right or the value they listed in the table is wrong? >> >> The SMILES string for this particular molecule is: >> 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O' >> >> Thanks in advance for your help! >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor
Hi Ali, Sorry I missed your email. The behaviour you describe is correct, due to a random seed in the conformer generation step. The descriptor value usually doesn't vary by too much. I think you can give the conformer generation a constant random seed if you need a reproducible number for nConf20. Regards, Richard On Tue, 28 Aug 2018, 00:25 Ali Eftekhari, wrote: > Hello all, > > I am trying to calculate 3D Descriptors following this publication: > "*Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility > in a Single Descriptor"*, Jerome G. P. Wicker and Richard I. Cooper. J. > Chem. Inf. Model. 2016, 56, 2347−2352 > > I am essentially using the same script as they have in the supporting > information and i have attached it here as well. In Table 2 from the above > calculation, the value of the descriptor (nConf20) for ZINC000290539224 > molecule is listed as 10. However, when I run the exact code as the one > they used, I get different value at each run. > > I have already contacted the authors but got no response. I am wondering > if the code they have in the supporting information is not right or the > value they listed in the table is wrong? > > The SMILES string for this particular molecule is: > 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O' > > Thanks in advance for your help! > > Ali > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Richard Cooper Head of Chemical Crystallography Associate Professor of Chemistry University of Oxford http://www.xtl.ox.ac.uk/ -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Capturing 3D Conformational Flexibility in a Single Descriptor
Hello all, I am trying to calculate 3D Descriptors following this publication: "*Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor"*, Jerome G. P. Wicker and Richard I. Cooper. J. Chem. Inf. Model. 2016, 56, 2347−2352 I am essentially using the same script as they have in the supporting information and i have attached it here as well. In Table 2 from the above calculation, the value of the descriptor (nConf20) for ZINC000290539224 molecule is listed as 10. However, when I run the exact code as the one they used, I get different value at each run. I have already contacted the authors but got no response. I am wondering if the code they have in the supporting information is not right or the value they listed in the table is wrong? The SMILES string for this particular molecule is: 'CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O' Thanks in advance for your help! Ali import numpy as np from rdkit import Chem from rdkit.Chem import AllChem from collections import OrderedDict def GenerateConformers(smil, numConfs): mol=Chem.MolFromSmiles(smil) molecule = Chem.AddHs(mol) conformerIntegers = [] conformers = AllChem.EmbedMultipleConfs (molecule,numConfs,pruneRmsThresh=0.5, numThreads =3) optimised_and_energies = AllChem.MMFFOptimizeMoleculeConfs(molecule , maxIters=600, numThreads =3, nonBondedThresh =100.0) EnergyDictionaryWithIDAsKey = {} FinalConformersToUse = {} for conformer in conformers : optimised , energy = optimised_and_energies[conformer] if optimised == 0: EnergyDictionaryWithIDAsKey [ conformer ] = energy conformerIntegers.append(conformer) lowestenergy = min( EnergyDictionaryWithIDAsKey.values () ) for k, v in (EnergyDictionaryWithIDAsKey.items()): if v == lowestenergy: lowestEnergyConformerID = k FinalConformersToUse [lowestEnergyConformerID] = lowestenergy molecule = AllChem.RemoveHs(molecule) matches = molecule.GetSubstructMatches ( molecule , uniquify=False ) maps = [ list (enumerate(match)) for match in matches] for conformerID in EnergyDictionaryWithIDAsKey.keys (): okayToAdd = True for finalconformerID in FinalConformersToUse.keys (): RMS = AllChem.GetBestRMS ( molecule , molecule , finalconformerID , conformerID , maps) if RMS< 1.0: okayToAdd = False break if okayToAdd: FinalConformersToUse [conformerID] = EnergyDictionaryWithIDAsKey [conformerID] sortedDictionary =OrderedDict (sorted ((FinalConformersToUse.items()),key=lambda t: t[1] )) energies = [val for val in (sortedDictionary.values())] energy_descriptor = 0 relative_energies=np.array(energies)-energies[0] for energy in relative_energies [1:]: if 0 <= energy < 20: energy_descriptor += 1 return energy_descriptor sm='CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O' non_SMILES=['C(F)C(F)(F)', 'C(OC1=C2)=NC1=CC=C2C(OC3=C4)=NC3=CC=C4','C(C=C1)=CC=C1OC(C=C2)=CC=C2O', 'C(OC1=C2)=NC1=CC=C2C(OC3=C4)=NC3=CC=C4','CC(C(#N))CC(C(#N))','C(C=C1)=CC=C1OC(C=C2)=CC=C2O', 'CC(C=C1)=CC=C1C(C=C2)=CC=C2O','NC(=O)NC(=O)'] print (GenerateConformers('CC(C)N2CC(NCc1cnc(C(C)O)s1)CC2=O',50))-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss