[Rdkit-discuss] calculation of spherical harmonics coefficients for 3D molecules?
RDkit Discussion Group, Has anyone implemented code to calculate spherical harmonicscoefficients for 3D conformations of molecules? The approach I havein mind is described in Morris, R.J. et al., Real Spherical Harmonic expansion coefficients for protein binding pocket and ligand comparisons,Bioinformatics, 21 (2005) 2347 - 2355. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] molecular dynamics using RDkit only
RDkit Discussion Group, I am aware of RDkit scripts that use the MMFF force field to minimize smallmolecules. Has anyone written RDkit code to perform molecular dynamics (MD) of small molecules or protein-ligand complexes using only RDkit and existingRDkit force fields. I am aware of a number of other programs to perform MD, but I am specifically interested in RDkit/Python only codes at present i.e., no other dependencies. If anyone has any code, even if preliminary, that calculates the potential energies, forces, velocities, accelerations, etc to propagate the motions of the atoms, and is willing to share that would be much appreciated. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Structure-Based Drug Design
RDkit Discussion Group, RDkit has quite a number of useful tools and algorithms for ligand-based drug design (LBDD). However, what about structure-based drug design (SBDD)? Perhaps a few questions to motivate the discussion. 1) Since RDkit supposedly includes the force field MMFF, does this mean that one can read in reasonably prepared proteins (from a PDB file) and ligands(from a MOL file) and compute energies of the complex, proteins, and ligands and presumably interaction energies, etc? 2) Visualization is clearly important in SBDD. Has anyone developed a tool that nicely integrates macromolecular editing and visualization with RDkit? 3) Given the visualization capabilities of Jupyter, has anyone developed Jupyter/RDkit scripts for #1 and #2? I welcome thoughts and comments especially from those who have been thinking about or are wrestling with SBDD and RDkit integration. Thank you. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] AM1-BCC charges for small molecules
RDkit Discussion Group, I am interested in generating and assigning AM1-BCC charges to small molecules, preferably in batch mode. I understand this topic has been discussed previously, buthas there been RDkit code written to do this? Since this relies on the results of AM1 calculations, has anyone perhaps written RDkit code to calculate and assign the charges if I have already generated a MOPAC output file by some other means? I greatly appreciate all the capabilities of RDkit, and not to be off-topic, but if someone is aware of a non-RDkit way to generate AM1-BCC charges, that might workfor me. Hence, please let me know. Thank you. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] atomic contributions to molecular polarizability
Greg, Thank you for your response and for your GitHub site where you are generouslyproviding snippets of code to help others. I agree that the atomic number to polarizabilityapproach is crude. However, experimental polarizabilities for small organics are available in the CRC Handbook and other sources. Hence, the error of this approach can be readily estimated. Thanks again for your help - greatly appreciated. Regards, Jim Metz -Original Message- From: Greg Landrum To: James T. Metz Cc: RDKit Discuss Sent: Fri, Jan 11, 2019 1:51 am Subject: Re: [Rdkit-discuss] atomic contributions to molecular polarizability Hi Jim, On Thu, Jan 10, 2019 at 11:59 PM James T. Metz via Rdkit-discuss wrote: I would like to calculate and be able to visualize the atomic contributionsto the total molecular polarizability of small organic molecules. Apparentlythere is a molecular descriptor, apol, that is the sum total from the atomiccontributions to polarizability available in some programs e.g., CDK. The CDK documentation doesn't include a citation for this (aside from a link to a web page that no longer exists), but since the code is there and this descriptor is super simple, it's easy to re-implement:https://gist.github.com/greglandrum/7936fcf631bfdae0041e298421554bec Does anyone have code that will compute and write out the atomic contributions to molecular polarizability to either the b factor column orperhaps the charge column of PDB or MOL2 files, respectively? I couldthen use other programs to visualize the structures with those numbers.Thank you. The gist linked above shows how to store values in the temperature factors that end up in PDB output. It's worth pointing out that this descriptor is pretty crude: the atomic contribution is determined solely by the atomic number, not by atom environment.The RDKit includes a more complicated (presumably more accurate?) polarizability descriptor: the Molar Refractivity (MR) values. The gist also shows how to get the atomic contributions to this. I hope this helps,-greg ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] atomic contributions to molecular polarizability
Carlos, Thank you for your suggestion. However, rather than wrestle with the technical details of obtaining polarizabilities from new quantum mechanics schemes, I am simply interested in extracting the atomic contributions that are apparently already available for adescriptor called apol which is apparently in the CDK descriptor set. I am not sure,but I think there are some overlaps between CDK and RDkit descriptors. I am hoping, perhaps, that someone has coded the atomic contributions for polarizabilityand hence with minor modifications to the code, can expose the atomiccontributions. At this point, I am willing to accept the approximate values from theapol descriptor. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] atomic contributions to molecular polarizability
RDkit Discussion Group, I would like to calculate and be able to visualize the atomic contributionsto the total molecular polarizability of small organic molecules. Apparentlythere is a molecular descriptor, apol, that is the sum total from the atomiccontributions to polarizability available in some programs e.g., CDK. Does anyone have code that will compute and write out the atomic contributions to molecular polarizability to either the b factor column orperhaps the charge column of PDB or MOL2 files, respectively? I couldthen use other programs to visualize the structures with those numbers.Thank you. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] How to encode atomic contributions to logP (hydrophobicity) in MOL2 formatted charge slot?
RDkit Discussion Group, Given a set of small molecules as a SDF file, I would like to generate a MOL2 file where the atomic contributions to logP (hydrophobicity) from each atom including hydrogens have been calculated and are now encoded in the partial atomic charge"slot" in a MOL2 file. Is this possible using RDkit? I have found code that calculates the atomic contributions for the Crippen logP model and then generates a colorized 2D plot: >>> from rdkit.Chem import rdMolDescriptors >>> contribs = >>> rdMolDescriptors._CalcCrippenContribs(mol) >>> fig = >>> SimilarityMaps.GetSimilarityMapFromWeights(mol,[x for x,y in >>> contribs],˓→colorMap='jet', contourLines=10) However, I would like to encode the atomic contributions as partial atomic charges so that this information can be written out in a MOL2 file for each atom. Does anyone have PYTHON/RDkit code to do this? Thank you. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Butina clustering with additional output
RDkit Discussion Group, I note that RDkit can perform Butina clustering. Given an SDF ofsmall molecules I would like to cluster the ligands, but obtain additionalinformation from the clustering algorithm. In particular, I would like to obtainthe cluster number and Tanimoto distance from the centroid for every ligandin the SDF. The centroid would obviously have a distance of 0.00. Has anyone written additional RDkit code to extract this additional information? Thank you. Regards, Jim Metz ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] descriptors beyond rotatable bond count and possible correlations with entropy
RDkit users, Is there a RDkit descriptor (or code) to determine the largest number of contiguous rotatable bonds in a small molecule? Hmmm... it seems likely that ligand conformational flexibility might be somehow related to the entropy component of ligand binding. Has anyone made a plot of the experimental TdS term from calorimetry vs. any number of computational measures of ligand flexibility? Any correlation? Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] How to get more CPU utilization for RDkit calculations
RDkit Discussion Group, I am running the RDkit diverse max min picker code on my WINDOWS 7 64 bit 4 dual core I7 computer. I have tested the code on a small set of compounds and the results are OK. I am now running diverse subset picking on an SDF containing about 40k compounds. The CPU usage across all 8 CPUs is about 14 - 15 %. Are there any flags or parameters that can be set in the PYTHON or RDkit code to get more CPU utilization? Thank you. Jim Metz Metz Research LLC -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Interest in a RDkit UGM in the USA midwest?
RDkit Discussion Group, Is there interest in having a RDkit UGM in the USA midwest, perhaps somewhere in the Chicago area? I would recommend a similar format to the European UGM and I would insist that presenters deposit their slides and code on the RDkit GitHub site. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] appropriate literature reference for RDkit 2017.03.1
RDkit Discussion Group, What is an appropriate literature reference for RDkit 2017.03.1? Thank you. Regards, Jim Metz Northwestern University -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] suggestions or ideas for SMILES for an electron?
RDkit Discussion Group, I am trying to get RDkit to recognize reactants and products in RedOx reactions including reactions which include explicit electrons using SMILES as patterns. Does anyone have any suggestions or ideas for how to deal with an explicit electron? I have tried: pattern = '[-1]' match = mol_object.GetSubstructMatches(pattern) but this does not work. I am hesitant to use [H-1], which is read in and is processed, but it technically refers to hydride which is a legitimate molecule species and is obviously not an electron. I can think of (possibly) another hack involving a large atomic number (beyond typical organics), but I am wondering if someone has a more intelligent solution, or perhaps there is some unusual or special SMILES notation that I am not aware of. Ideas or suggestions are welcome. Thank you. Regards, Jim Metz Northwestern University -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] RDkit ionization state prediction at a given pH
RDkit Discussion Group, Can RDkit predict the ionization state of a molecule at a given pH by calculating the pKa, predominant species at that pH, etc? I have already checked the RDkit discussion email archive, and per my searching there was a response from Greg Landrum about 10-18-2017 stating that this was attempted, but was not successfully completed due to technical difficulties (and was abandoned?). Hence, I am inquiring if anyone has done further work in this area. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] automatic generation of reaction SMARTS in RDkit
RDkit discussion group, Is it possible to automatically generate reaction SMARTS in RDkit from reactant(s) and product(s)? If so, are there various "trims" that can be performed to focus on the reaction center going out certain number of atoms which could be used to specify reaction specificity. For example, suppose I have reactant_smiles = 'CCO' product_smiles = 'CC=O' I am looking for PYTHON/RDkit code that would automatically generate reaction SMARTS such as: [*:1]-[C]-[O]>> [*:1]-[C]=[O] I would be grateful for any suggestions, code, literature examples, etc. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] suggestions for comprehensive searchable database of natural products
RDkit Discussion Group, My apologies in advance if my request is not appropriate for this discussion group. Given a small molecule that might have some resemblance to natural products, can someone suggest a free, comprehensive, PYTHON/RDkit searchable database of natural products that might be suitable for similarity and substructure searching. I am aware of a few websites that permit searching on the website. If possible, I would like to programmatically search by running a PYTHON/RDkit script on my local machine and then return the structures of related molecules to my local script. I would prefer not having to download and store a huge database. Also, if possible, it would be important to return the organism(s) that creates the natural product. Pathway information would be also very, very helpful. I greatly welcome comments and suggestions. Thank you. Regards, Jim Metz Northwestern University -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] transformation of an atom or group of atoms by atom indices
Jason, Exactly what I need to do. Thank you for your example code. Much appreciated. Regards, Jim Metz -Original Message- From: Jason Biggs <jasondbi...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Thu, Nov 9, 2017 3:28 pm Subject: Re: [Rdkit-discuss] transformation of an atom or group of atoms by atom indices Something along these lines? mol = Chem.MolFromSmiles('') emol = Chem.EditableMol(mol) emol.ReplaceAtom(3, Chem.Atom(1)) mol = emol.GetMol() In [12]: Chem.MolToSmiles(mol) Out[12]: '[H]CCC' In [13]: Chem.MolToSmiles(Chem.RemoveHs(mol)) Out[13]: 'CCC' Jason Biggs On Thu, Nov 9, 2017 at 3:12 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: RDKit Discussion Group, Suppose I have a molecule smiles1 = '' The carbon atoms will be assigned indices 0, 1, 2, and 3. Suppose I want to specifically change carbon 3 to a hydrogen. Is this possible using RDkit? I am aware of using SMARTS to match a pattern and then change that group of atoms. In my example, I would like to be able to change an atom or group of atoms based on the atom indices, not a SMARTS pattern which might be problematic for molecules with local symmetry. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] transformation of an atom or group of atoms by atom indices
RDKit Discussion Group, Suppose I have a molecule smiles1 = '' The carbon atoms will be assigned indices 0, 1, 2, and 3. Suppose I want to specifically change carbon 3 to a hydrogen. Is this possible using RDkit? I am aware of using SMARTS to match a pattern and then change that group of atoms. In my example, I would like to be able to change an atom or group of atoms based on the atom indices, not a SMARTS pattern which might be problematic for molecules with local symmetry. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match
Brian, Greg, and David, Thank you for your suggestions. I will try to respond to your questions and comments: I am trying to reproduce results from a literature paper that used non-PYTHON and non-RDkit code to identify certain patterns in molecules as part of a group contribution scheme resulting in the prediction of thermodynamic quantities. I have a training set of molecules and the results of calculations for that training set (individual counts of groups of atoms and resulting energies). Hence, my first goal is to reproduce the results reported for that training set, but using PYTHON and RDkit. Since my goal is to reproduce literature results as closely as possible, I am not in a position to debate the logic of the original authors in their assignments of SMARTS/SMILES matching and counts. After this initial goal is met, I might consider alternative pattern matching and counting schemes and compare those results to the literature results. In fact, that would be good science. As I mentioned in my first email on this topic, I do think I have come up with a "rule" that will give me the correct answer (I have tried it for 8 cases using pencil and paper), my challenge is to code up the "rule" in PYTHON. I am a beginner at PYTHON, so I am struggling to get this idea into functional, bug-free code. Peter Shenkin's idea/code is getting close to what needs to be done, but doesn't handle all the cases. Regards, Jim Metz -Original Message- From: Brian Cole <col...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Tue, Nov 7, 2017 7:23 pm Subject: Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match You can use Chem.CanonicalRankAtoms to de-duplicate the SMARTS matches based upon the atom symmetry like this: def count_unique_substructures(smiles, smarts): mol = Chem.MolFromSmiles(smiles) ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False)) pattern = Chem.MolFromSmarts(smarts) unique_sets_of_atoms = set() for match in mol.GetSubstructMatches(pattern): match_ranks = frozenset([ranks[idx] for idx in match]) unique_sets_of_atoms.add(match_ranks) return len(unique_sets_of_atoms) However, this returns 1 for each of your cases. It's not clear to me why you would want your 2nd case to return 2 as all paths from a chlorine to a chlorine through 2 carbons are symmetric. >>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]' >>> smiles1 = 'ClC(Cl)CCl' >>> smiles2 = 'ClC(Cl)C(Cl)(Cl)(Cl)' >>> count_unique_substructures(smiles1, SMARTS) 1 >>> count_unique_substructures(smiles2, SMARTS) 1 -Brian On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: RDkit Discussion Group, I have written a SMARTS to detect vicinal chlorine groups using RDkit. There are 4 atoms involved in a vicinal chlorine group. SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]' I am trying to count the number of ("unique") occurrences of this pattern. For some molecules with symmetry, this results in over-counting. For the molecule, smiles1 below, I want to obtain a count of 1 i.e., 1 tuple of 4 atoms. smiles1 = 'ClC(Cl)CCl' However, using the SMARTS above, I obtain 2 tuples of 4 atoms. Beginning with a MOL file representation of smiles1, I get ((1,2,4,3), (0,2,4,3)) One possible solution is to somehow merge the two tuples according to a "rule." One rule that works is "if 3 of the atom indices are the same, then combine into one tuple." However, the rule needs a bit of modification for more complicated cases (higher symmetry). Consider smiles2 = 'ClC(Cl)CCl(Cl)(Cl) My goal is to get 2 tuples of 4 atoms for smiles2 smiles2 is somewhat tricky because there are either 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom) tuples depending on how you choose your 3 atom indices. Again, if my goal is to get 2 tuples, then I need to somehow pick the largest group, i.e., 2 groups of 3 tuples to do the merge operation which will give me 2 remaining groups (desired). I have already checked stackoverflow and a few other places for PYTHON code to do the necessary merging, but I could not find anything specific and appropriate. I would be most grateful if anyone has ideas how to do this. I suspect the answer is a few lines of well-written PYTHON code, and not modifying the SMARTS (I could be mistaken!). Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashd
Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match
Peter, Thank you for your suggestions and accompanying code. I have modified your code slightly and have created 3 tuples for testing. Your code works for tuples, match1 and match2, but does not work for match3. The code should return a 2 for match3, because there are 2 sets of 3 tuples each containing 4 atom indices. Using my "rule" that, "if 3 indices are the same, they are in one group and one must form the groups of the largest possible size", one arrives at 2 groups. The merge function should then select one tuple from each group, resulting in a count of 2 (for the final number of groups). Keep in mind that I will not know how many groups of tuples will be created for any given molecule. Hence, I can not use hard coded array indices. Any ideas how to modify the code below to obtain the desired result for tuple, match3, and how to deal with tuples of various sizes? Regards, Jim Metz def merge2(matches): if len(matches) > 1: d = {} for match in matches: t = (matches[0], matches[1]) if (matches[0] < matches[1]): t = (matches[0], matches[1]) else: t = (matches[1], matches[0]) d[t] = match merged_match = (d[t],) else: merged_match = matches count = len(merged_match) return(count) match1 = ((0,2,3,4),) match2 = ((0,2,3,4), (1,2,3,4)) match3 = ((0,2,4,5), (1,2,5,6), (2,3,4,5), (2,3,5,6), (0,2,5,6), (1,2,4,5)) matches = match2 # Change the number to test different tuples output = merge2(matches) print("Output is ", output) -Original Message- From: Peter S. Shenkin <shen...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Tue, Nov 7, 2017 7:05 pm Subject: Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match I think you probably used a slightly different SMILES than the one you showed. The one you showed should have given ((0,1,3,4),(2,1,3,4)). The proper merge rule would then be to consider all matches equivalent if the 2nd and 3rd atom in the match agree, in any order; i.e, the two carbons, indices 1 and 3 in this case. So to do this, for each molecule, do something like this: d = dict{} for match in matches: t = (match[1], match[2]) if match[1] < match[2] ): t = (match[1], match[2]) else: t = (match[2], match[1]) d[t] = match You will wind up with as many dictionary elements as there are matches. -P. On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: RDkit Discussion Group, I have written a SMARTS to detect vicinal chlorine groups using RDkit. There are 4 atoms involved in a vicinal chlorine group. SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]' I am trying to count the number of ("unique") occurrences of this pattern. For some molecules with symmetry, this results in over-counting. For the molecule, smiles1 below, I want to obtain a count of 1 i.e., 1 tuple of 4 atoms. smiles1 = 'ClC(Cl)CCl' However, using the SMARTS above, I obtain 2 tuples of 4 atoms. Beginning with a MOL file representation of smiles1, I get ((1,2,4,3), (0,2,4,3)) One possible solution is to somehow merge the two tuples according to a "rule." One rule that works is "if 3 of the atom indices are the same, then combine into one tuple." However, the rule needs a bit of modification for more complicated cases (higher symmetry). Consider smiles2 = 'ClC(Cl)CCl(Cl)(Cl) My goal is to get 2 tuples of 4 atoms for smiles2 smiles2 is somewhat tricky because there are either 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom) tuples depending on how you choose your 3 atom indices. Again, if my goal is to get 2 tuples, then I need to somehow pick the largest group, i.e., 2 groups of 3 tuples to do the merge operation which will give me 2 remaining groups (desired). I have already checked stackoverflow and a few other places for PYTHON code to do the necessary merging, but I could not find anything specific and appropriate. I would be most grateful if anyone has ideas how to do this. I suspect the answer is a few lines of well-written PYTHON code, and not modifying the SMARTS (I could be mistaken!). Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss --
[Rdkit-discuss] Python code to merge tuples from a SMARTS match
RDkit Discussion Group, I have written a SMARTS to detect vicinal chlorine groups using RDkit. There are 4 atoms involved in a vicinal chlorine group. SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]' I am trying to count the number of ("unique") occurrences of this pattern. For some molecules with symmetry, this results in over-counting. For the molecule, smiles1 below, I want to obtain a count of 1 i.e., 1 tuple of 4 atoms. smiles1 = 'ClC(Cl)CCl' However, using the SMARTS above, I obtain 2 tuples of 4 atoms. Beginning with a MOL file representation of smiles1, I get ((1,2,4,3), (0,2,4,3)) One possible solution is to somehow merge the two tuples according to a "rule." One rule that works is "if 3 of the atom indices are the same, then combine into one tuple." However, the rule needs a bit of modification for more complicated cases (higher symmetry). Consider smiles2 = 'ClC(Cl)CCl(Cl)(Cl) My goal is to get 2 tuples of 4 atoms for smiles2 smiles2 is somewhat tricky because there are either 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom) tuples depending on how you choose your 3 atom indices. Again, if my goal is to get 2 tuples, then I need to somehow pick the largest group, i.e., 2 groups of 3 tuples to do the merge operation which will give me 2 remaining groups (desired). I have already checked stackoverflow and a few other places for PYTHON code to do the necessary merging, but I could not find anything specific and appropriate. I would be most grateful if anyone has ideas how to do this. I suspect the answer is a few lines of well-written PYTHON code, and not modifying the SMARTS (I could be mistaken!). Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] ctest
RDkit Experts, What is ctest? Is it fully documented somewhere? I am running PYTHON 3.5.2/RDkit 2017.03.1 on WINDOWS 7 via Pycharm 2017.2.3 Is it possible to run ctest in this environment? ctest seems to be a good way to exercise and test many of the capabilities of RDkit, some of which I am not familiar with. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] 2017.09.1 RDKit release
Many thanks to all for this new RDKit release. I have used commercial software in my past, and I am more and more impressed by the capabilities of RDKit and other open source software. Please keep up the good work! Regards, Jim Metz -Original Message- From: rdkit-discuss-requestTo: rdkit-discuss Sent: Sun, Oct 8, 2017 7:33 am Subject: Rdkit-discuss Digest, Vol 120, Issue 22 Send Rdkit-discuss mailing list submissions to rdkit-discuss@lists.sourceforge.net To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/rdkit-discuss or, via email, send a message with subject or body 'help' to rdkit-discuss-requ...@lists.sourceforge.net You can reach the person managing the list at rdkit-discuss-ow...@lists.sourceforge.net When replying, please edit your Subject line so it is more specific than "Re: Contents of Rdkit-discuss digest..." Today's Topics: 1. 2017.09.1 RDKit release (Greg Landrum) -- Message: 1 Date: Sun, 8 Oct 2017 14:32:54 +0200 From: Greg Landrum To: RDKit Discuss , rdkit-annou...@lists.sourceforge.net, RDKit Developers List Subject: [Rdkit-discuss] 2017.09.1 RDKit release Message-ID: Content-Type: text/plain; charset="utf-8" I'm pleased to announce that the next version of the RDKit -- 2017.09 -- is released. The release notes are below. The release files are on the github release page: https://github.com/rdkit/rdkit/releases/tag/Release_2017_09_1 Binaries have been uploaded to anaconda.org (https://anaconda.org/rdkit). The available conda binaries for this release are: Linux 64bit: python 2.7, 3.5, 3.6 Mac OS 64bit: python 2.7, 3.5, 3.6 Windows 64bit: python 2.7, 3.5, 3.6 I left out the Win32 python2.7 build this time. If that's important to someone, let me know and I'll see if I can get it working again. Some notes on the conda builds: - These builds are tested with conda v4.3.25 and, as of the release date, are very unlikely to work with anything newer than that. This thread has more information on that: https://www.mail-archive.com/rdkit-discuss@lists. sourceforge.net/msg07315.html - The conda builds now depend on numpy 1.13 instead of 1.11. - The Mac and Linux builds now use v1.63 of boost. The rdkit conda channel has the appropriate binaries. Some things that will be finished over the next couple of days: - The conda build scripts will be updated to reflect the new version and new conda builds will be available in the RDKit channel at anaconda.org ( https://anaconda.org/rdkit). - The homebrew script - The online version of the documentation at rdkit.org Thanks to everyone who submitted bug reports and suggestions for this release! Please let me know if you find any problems with the release or have suggestions for the next one, which is scheduled for March 2018. Best Regards, -greg # Release_2017.09.1 (Changes relative to Release_2017.03.1) ## Important - The fix for bug #1567 changes the way fragment SMILES are canonicalized. MolFragmentToSmiles() and canonicalizeFragment() will now often return different results - The fix for bug #1604 changes the behavior of QueryAtom::setQuery(), which now deletes the current query before setting the new value. If you are using QueryAtom::setQuery() from C++ (or possibly Java), be sure that you are not also deleting that memory. ## Acknowledgements: Brian Cole, Peter Gedeck, Guillaume Godin, Jan Halborg Jensen, Malitha Kabir, Tuomo Kalliokoski, Brian Kelley, Noel O'Boyle, Matthew O'Meara, Pavel Polishchuk, Cameron Pye, Christian Ribeaud, Stephen Roughley, Patrick Savery, Roger Sayle, Nadine Schneider, Gregor Simm, Matt Swain, Paolo Tosco, Alain Vaucher, Sam Webb, 'phenethyl', 'xiaotaw' ## Highlights: - The new R-Group decomposition code provides a flexible and powerful tool for building R-group tables or datasets look in $RDBASE/Docs/Notebooks for example notebooks showing how to use this. - Drawing of chemical reactions has been greatly improved and is now done using the C++ rendering code. - The MaxMinPicker is dramatically faster. - New descriptors: the QED descriptor has been added as have a large collection of new 3D descriptors and implementations of the USR and USRCAT fingerprints. ## New Features and Enhancements: - Bring back USR and USRCAT descriptors (github pull #1417 from greglandrum) - Generate a warning for conflicting bond directions (github issue #1423 from greglandrum) - expose and test GetDrawCoords() (github pull #1427 from greglandrum) - Improvement suggestions for SaltRemover (github issue #1431 from ribeaud) - Remove obsolete scripts from
Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
Chris, Wow! Your recursive SMARTS expression works as needed! Hmmm... Help me understand this better ... it looks like you "walk around" the ring of the substructure we want to exclude and employ a slightly different recursive SMARTS beginning at that atom. Is that correct? Also, since my situation is likely to get more complicated with respect to exclusions, suppose I still wanted to utilize the general aromatic expression for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to exclude the structures we have been discussing, and I also wanted to exclude pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1. Is there a SMARTS expression that would capture 2 exclusions? Perhaps this is getting too clumsy! It might be better to have one or more inclusion SMARTS and one or more exclusion SMARTS, and write code to remove those groups of atoms that are coming from the exclusion SMARTS. Any ideas for PYTHON/RDkit code? Something like test_smiles = 'c1c1' inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1' exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1' etc... Hmmm... any other ideas, suggestions, comments? Thanks again. Regards, Jim Metz -Original Message- From: Chris Earnshaw <cgearns...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: Rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net> Sent: Sun, Sep 24, 2017 4:01 am Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion Hi Jim It can be done with recursive SMARTS, though the syntax is a bit painful This may do what you want - [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1 Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1, with recursive SMARTS applied to the first atom to ensure that this can't match any of the 6 ring atoms in your undesired system. Regards, Chris Earnshaw On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Suppose I have the following molecule > > m = 'CN1C=CC(=O)NC1=O' > > I would like to be able to use a SMARTS pattern > > pattern = '[a]1:[a][a]:[a]:[a]:a]1' > > to recognize the 6 atoms in a typical aromatic ring, but > I do not want to recognize the 6 atoms in the molecule, > m, as aromatic. In other words, I am trying to write > a specific exclusion. > > Is it possible to modify the SMARTS pattern to > exclude the above molecule? I have tried using > recursive SMARTS, but I can't get the syntax to > work. > > Any ideas? Thank you. > > Regards, > Jim Metz > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?
Greg, You are suggesting some interesting ideas. Probably matching of atoms in 5 and 6-membered aromatic rings will be sufficient for now. I was initially stumped trying to figure out an elegant way to deal with aromatic N's, O's, and S's in various combinations. The usage of "a" in SMARTS is powerful in this regard. Thanks again. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: Jason Biggs <jasondbi...@gmail.com>; RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Thu, Sep 21, 2017 12:32 am Subject: Re: [Rdkit-discuss] SMARTS for heteroaromatic rings? My approach to this would depend on what you're trying to accomplish in the end. If you just want all the aromatic atoms you can just use "[a]". Unless you do some extra work when you read in the molecules, any aromatic atom will be in a ring. If you want to be really sure, you can do "[a;r]" If you want all the aromatic bonds, it's "[a]:[a]" If you want the rings themselves and you want to just use SMARTS, you have to enumerate. Python makes getting the patterns pretty easy: In [8]: patts = ["[a]:1"+":[a]"*i+":[a]:1" for i in range(3,22)] # 24 is the max aromatic ring size In [9]: patts[:3] Out[9]: ['[a]:1:[a]:[a]:[a]:[a]:1', '[a]:1:[a]:[a]:[a]:[a]:[a]:1', '[a]:1:[a]:[a]:[a]:[a]:[a]:[a]:1'] The rest is just some calls to MolFromSmarts() and then mol.GetSubstructMatches() for the molecules you want to test. -greg On Thu, Sep 21, 2017 at 3:56 AM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Jason, Thanks! I just thought of that for a 6-membered ring. A 5-membered ring would be [a]1[a][a][a][a]1. Hmmm... I was thinking of using "r" to specify a ring, but I don't think that would be necessary. Correct? Regards, Jim Metz -Original Message- From: Jason Biggs <jasondbi...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Wed, Sep 20, 2017 8:36 pm Subject: Re: [Rdkit-discuss] SMARTS for heteroaromatic rings? if you don't care what type of atom it is, just that it's aromatic, you should use [a], so [a]1[a][a][a][a][a]1 would match any 6-membered aromatic ring Jason Biggs On Wed, Sep 20, 2017 at 7:57 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, I would like to write a SMARTS that will match all of the individual atoms in all possible heteroaromatic rings. Does anyone know of an elegant, compact way to do this? If one SMARTS will not work, I can concatenate SMARTS using a vertical pipe, "|", as I proposed in an earlier message in this forum. I am (perhaps) expecting SMARTS something like [c]1[c][n][c][c]1 etc [c]1[c][c][c][c][c]1 [c]1[c][n][c][c][c]1 etc. Perhaps there is a very elegant way to specify the possible patterns. I can't think of a way to do it, other than exhaustive enumeration. Any ideas? Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?
Jason, Thanks! I just thought of that for a 6-membered ring. A 5-membered ring would be [a]1[a][a][a][a]1. Hmmm... I was thinking of using "r" to specify a ring, but I don't think that would be necessary. Correct? Regards, Jim Metz -Original Message- From: Jason Biggs <jasondbi...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Wed, Sep 20, 2017 8:36 pm Subject: Re: [Rdkit-discuss] SMARTS for heteroaromatic rings? if you don't care what type of atom it is, just that it's aromatic, you should use [a], so [a]1[a][a][a][a][a]1 would match any 6-membered aromatic ring Jason Biggs On Wed, Sep 20, 2017 at 7:57 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, I would like to write a SMARTS that will match all of the individual atoms in all possible heteroaromatic rings. Does anyone know of an elegant, compact way to do this? If one SMARTS will not work, I can concatenate SMARTS using a vertical pipe, "|", as I proposed in an earlier message in this forum. I am (perhaps) expecting SMARTS something like [c]1[c][n][c][c]1 etc [c]1[c][c][c][c][c]1 [c]1[c][n][c][c][c]1 etc. Perhaps there is a very elegant way to specify the possible patterns. I can't think of a way to do it, other than exhaustive enumeration. Any ideas? Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS for heteroaromatic rings?
Hello, I would like to write a SMARTS that will match all of the individual atoms in all possible heteroaromatic rings. Does anyone know of an elegant, compact way to do this? If one SMARTS will not work, I can concatenate SMARTS using a vertical pipe, "|", as I proposed in an earlier message in this forum. I am (perhaps) expecting SMARTS something like [c]1[c][n][c][c]1 etc [c]1[c][c][c][c][c]1 [c]1[c][n][c][c][c]1 etc. Perhaps there is a very elegant way to specify the possible patterns. I can't think of a way to do it, other than exhaustive enumeration. Any ideas? Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR
Chris, Thank you for your interesting suggestion, but it is not quite what I need. For example, consider the molecule m = Chem.MolFromSmiles("CCNN") I am looking for one SMARTS that using the SMARTS pattern matching capability in RDkit would return 2 groups, each group containing the two atoms corresponding to CC and NN. Your suggested recursive SMARTS and code below pattern = Chem.MolFromSmarts('[$(C-C),$(N-N)]') match = m.GetSubstructMatches(pattern) match returns ((0,), (1,), (2,), (3,)) The output I am trying to achieve, instead, is ((0,1), (2,3)) Is there a single SMARTS that will do that? Regards, Jim Metz -Original Message- From: Chris Earnshaw <cgearns...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: Rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net> Sent: Tue, Sep 19, 2017 10:13 am Subject: Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR Hi Will the recursive SMARTS [$(C-C),$(N-N)] not do the job? I'd parse this in English as 'an atom which is EITHER an aliphatic carbon singly bonded to an aliphatic carbon OR an aliphatic nitrogen singly bonded to an aliphatic nitrogen'. Regards, Chris Earnshaw On 19 September 2017 at 15:01, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Dante, Yes. In principle, if one can figure out all of the possible undesired cross matches. Since my goal is to do this in RDkit and generate groups of atoms that match, perhaps one approach is to simply use multiple RDkit pattern matching statements (with multiple SMARTS), generate the groups of atoms, then combine the lists, removing identical groups. Hmmm... Is there a more straightforward (elegant) solution? Regards, Jim Metz -Original Message- From: Dante <dante.esgrimi...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Tue, Sep 19, 2017 8:45 am Subject: Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR Hi Jim, Could you use the 'NOT' logical operator (!) in combination with recursive SMARTS to eliminate the cross-matches? Cheers, Dante On Tue, Sep 19, 2017 at 9:13 AM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, Is it possible to write a single SMARTS for two separate patterns involving a Boolean OR? For example, I want to write a single SMARTS that can match the patterns of [C]-[C] or [N]-[N] I realize that I could write something like [C,N]-[C,N] but that would also match "cross" patterns such as CN and NC which I don't want. I have tried to write ([C]-[C]), ([N\-[N]) but I have not been able to get that syntax or related expressions (variations of parentheses, brackets, etc) to work. Hence, if someone knows how to combine separate SMARTS expressions into a single expression with a Boolean OR, I would be grateful. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR
Dante, Yes. In principle, if one can figure out all of the possible undesired cross matches. Since my goal is to do this in RDkit and generate groups of atoms that match, perhaps one approach is to simply use multiple RDkit pattern matching statements (with multiple SMARTS), generate the groups of atoms, then combine the lists, removing identical groups. Hmmm... Is there a more straightforward (elegant) solution? Regards, Jim Metz -Original Message- From: Dante <dante.esgrimi...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Tue, Sep 19, 2017 8:45 am Subject: Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR Hi Jim, Could you use the 'NOT' logical operator (!) in combination with recursive SMARTS to eliminate the cross-matches? Cheers, Dante On Tue, Sep 19, 2017 at 9:13 AM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, Is it possible to write a single SMARTS for two separate patterns involving a Boolean OR? For example, I want to write a single SMARTS that can match the patterns of [C]-[C] or [N]-[N] I realize that I could write something like [C,N]-[C,N] but that would also match "cross" patterns such as CN and NC which I don't want. I have tried to write ([C]-[C]), ([N\-[N]) but I have not been able to get that syntax or related expressions (variations of parentheses, brackets, etc) to work. Hence, if someone knows how to combine separate SMARTS expressions into a single expression with a Boolean OR, I would be grateful. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] single SMARTS for two patterns with Boolean OR
Hello, Is it possible to write a single SMARTS for two separate patterns involving a Boolean OR? For example, I want to write a single SMARTS that can match the patterns of [C]-[C] or [N]-[N] I realize that I could write something like [C,N]-[C,N] but that would also match "cross" patterns such as CN and NC which I don't want. I have tried to write ([C]-[C]), ([N\-[N]) but I have not been able to get that syntax or related expressions (variations of parentheses, brackets, etc) to work. Hence, if someone knows how to combine separate SMARTS expressions into a single expression with a Boolean OR, I would be grateful. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] how to output multiple Kekule structures
Paolo, Exactly what I was looking for. Very helpful. Thank you. Regards, Jim Metz -Original Message- From: Paolo Tosco <paolo.to...@unito.it> To: James T. Metz <jamestm...@aol.com>; greg.landrum <greg.land...@gmail.com>; rdkit-discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 2:53 pm Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures Hi Jim, you can indeed enumerate all Kekulè structures for a molecule withinthe RDKit using Chem.ResonanceMolSupplier(): from rdkit import Chem mol = Chem.MolFromSmiles('c1c1') suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL) len(suppl) 2 for i in range(len(suppl)): print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True)) C1C=CC=CC=1 C1=CC=CC=C1 Best, Paolo On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote: Greg, Thanks! Yes, very helpful. I will need to digest the detailed information you have provided. I am somewhat familiar with recursive SMARTS. Thanks again. Regards, Jim Metz -OriginalMessage- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss<rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 11:15 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote: Greg, I need to be able to use SMARTSpatterns to identify substructures inmolecules that can be aromatic, and I need to beable to handle cases where there can be differences in the way that the moleculewas entered or drawn by a user. That particular problem is a big part of the reason that we tend to use the aromatic representation of things. For example, consider the following alkenyl-substituted pyridine, there are two possible Kekule structures m1 = 'C=CC1=NC=CC=C1' m2 = 'C=CC1N=CC=CC1' Fixing what I assume is a typo for m2, I cando the following: In [11]: m1 =Chem.MolFromSmiles('C=CC1=NC=CC=C1') In [12]: m2 =Chem.MolFromSmiles('C=CC1N=CC=CC=1') In [13]: q1 = Chem.MolFromSmarts('')
Re: [Rdkit-discuss] how to output multiple Kekule structures
Greg, Thanks! Yes, very helpful. I will need to digest the detailed information you have provided. I am somewhat familiar with recursive SMARTS. Thanks again. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 11:15 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote: Greg, I need to be able to use SMARTS patterns to identify substructures in molecules that can be aromatic, and I need to be able to handle cases where there can be differences in the way that the molecule was entered or drawn by a user. That particular problem is a big part of the reason that we tend to use the aromatic representation of things. For example, consider the following alkenyl-substituted pyridine, there are two possible Kekule structures m1 = 'C=CC1=NC=CC=C1' m2 = 'C=CC1N=CC=CC1' Fixing what I assume is a typo for m2, I can do the following: In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1') In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1') In [13]: q1 = Chem.MolFromSmarts('') In [14]: q2 = Chem.MolFromSmarts('cccn') In [15]: list(m1.GetSubstructMatch(q1)) Out[15]: [2, 7, 6, 5] In [16]: list(m1.GetSubstructMatch(q2)) Out[16]: [6, 5, 4, 3] In [17]: list(m2.GetSubstructMatch(q1)) Out[17]: [2, 7, 6, 5] In [18]: list(m2.GetSubstructMatch(q2)) Out[18]: [6, 5, 4, 3] Those particular queries were going for the aromatic species and will only match inside the ring, but if you want to be more generic you could tune your queries like this: In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]') In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]') In [30]: list(m1.GetSubstructMatch(q3)) Out[30]: [0, 1, 2, 7] In [31]: list(m1.GetSubstructMatch(q4)) Out[31]: [0, 1, 2, 3] In [32]: list(m2.GetSubstructMatch(q3)) Out[32]: [0, 1, 2, 7] In [33]: list(m2.GetSubstructMatch(q4)) Out[33]: [0, 1, 2, 3] If you aren't familiar with recursive SMARTS, this construct: "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an aromatic bond to another atom". So you can interpret q3 as "four carbons that each have either a double or aromatic bond and that are connected to each other by single, double, or aromatic bonds". Is this starting to approximate what you're looking for? -greg Now consider two SMARTS pattern1 = '[C]=[C]-[C]={C] pattern2 = '[C]=[C]-[C]=[N]' I need to be able to detect the existence of each pattern in the molecule If m1 is the only available generated Kekule structure, then pattern2 will be recognized. If m2 is the only available generated Kekule structure, then pattern1 will be recognized. Hence, I am getting different answers for the same input molecule just because it was drawn in different Kekule structures. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 10:31 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures Hi Jim, The code currently has no way to enumerate Kekule structures. I don't recall this coming up in the past and, to be honest, it doesn't seem all that generally useful. Perhaps there's an alternate way to solve the problem; what are you trying to do? -greg On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, Suppose I read in an aromatic SMILES e.g., for benzene c1c1 I would like to generate the major canonical resonance forms and save the results as two separate molecules. Essentially I am trying to generate m1 = 'C1=CC=CC-C1' m2 = 'C1C=CC=CC1' Can this be done in RDkit? I have found a KEKULE_ALL option in the detailed documentation which seems to be what I am trying to do, but I don't understand how this option is to be used, or the proper syntax. If it is necessary to somehow renumber the atoms and re-generate Kekule structures, that is OK. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@li
Re: [Rdkit-discuss] how to output multiple Kekule structures
Greg, I need to be able to use SMARTS patterns to identify substructures in molecules that can be aromatic, and I need to be able to handle cases where there can be differences in the way that the molecule was entered or drawn by a user. For example, consider the following alkenyl-substituted pyridine, there are two possible Kekule structures m1 = 'C=CC1=NC=CC=C1' m2 = 'C=CC1N=CC=CC1' Now consider two SMARTS pattern1 = '[C]=[C]-[C]={C] pattern2 = '[C]=[C]-[C]=[N]' I need to be able to detect the existence of each pattern in the molecule If m1 is the only available generated Kekule structure, then pattern2 will be recognized. If m2 is the only available generated Kekule structure, then pattern1 will be recognized. Hence, I am getting different answers for the same input molecule just because it was drawn in different Kekule structures. Regards, Jim Metz -Original Message- From: Greg Landrum <greg.land...@gmail.com> To: James T. Metz <jamestm...@aol.com> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Sent: Mon, Sep 11, 2017 10:31 am Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures Hi Jim, The code currently has no way to enumerate Kekule structures. I don't recall this coming up in the past and, to be honest, it doesn't seem all that generally useful. Perhaps there's an alternate way to solve the problem; what are you trying to do? -greg On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote: Hello, Suppose I read in an aromatic SMILES e.g., for benzene c1c1 I would like to generate the major canonical resonance forms and save the results as two separate molecules. Essentially I am trying to generate m1 = 'C1=CC=CC-C1' m2 = 'C1C=CC=CC1' Can this be done in RDkit? I have found a KEKULE_ALL option in the detailed documentation which seems to be what I am trying to do, but I don't understand how this option is to be used, or the proper syntax. If it is necessary to somehow renumber the atoms and re-generate Kekule structures, that is OK. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] how to output multiple Kekule structures
Hello, Suppose I read in an aromatic SMILES e.g., for benzene c1c1 I would like to generate the major canonical resonance forms and save the results as two separate molecules. Essentially I am trying to generate m1 = 'C1=CC=CC-C1' m2 = 'C1C=CC=CC1' Can this be done in RDkit? I have found a KEKULE_ALL option in the detailed documentation which seems to be what I am trying to do, but I don't understand how this option is to be used, or the proper syntax. If it is necessary to somehow renumber the atoms and re-generate Kekule structures, that is OK. Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules
Hello, Suppose I read in the SMILES of an aromatic molecule e.g., for benzene c1c1 I then want to convert the molecule to a Kekule representation and then perform various SMARTS pattern recognition e.g. [C]=[C]-[C] I have tried various Kekule commands in RDkit, but I can not figure out how to (or if it is possible) to recognize a SMARTS pattern for a portion of a molecule which is aromatic, but is currently being stored as a Kekule structure. Also, is it possible to generate and store more than one Kekule form in RDkit? Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 119, Issue 6
TJ, Your suggestion solved my problem. Thanks! I need to pay closer attention to the SMARTS documentation! Regards, Jim Metz -Original Message- From: rdkit-discuss-request <rdkit-discuss-requ...@lists.sourceforge.net> To: rdkit-discuss <rdkit-discuss@lists.sourceforge.net> Sent: Wed, Sep 6, 2017 9:17 pm Subject: Rdkit-discuss Digest, Vol 119, Issue 6 Send Rdkit-discuss mailing list submissions to rdkit-discuss@lists.sourceforge.net To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/rdkit-discuss or, via email, send a message with subject or body 'help' to rdkit-discuss-requ...@lists.sourceforge.net You can reach the person managing the list at rdkit-discuss-ow...@lists.sourceforge.net When replying, please edit your Subject line so it is more specific than "Re: Contents of Rdkit-discuss digest..." Today's Topics: 1. Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens (James T. Metz) 2. Re: Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens (TJ O'Donnell) 3. Re: Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens (TJ O'Donnell) -- Message: 1 Date: Wed, 6 Sep 2017 19:34:02 -0400 From: "James T. Metz" <jamestm...@aol.com> To: RDkit-discuss@lists.sourceforge.net, jamestm...@aol.com Subject: [Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens Message-ID: <15e598b1b33-c09-48...@webjas-vab043.srv.aolmail.net> Content-Type: text/plain; charset="utf-8" Hello, Given the following SMILES for a macrocyclic hexaose OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO)OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O can anyone suggest a SMARTS pattern that will distinguish ether oxygens in the smaller 6-membered rings versus the ethers in the larger macrocyclic structure? For example, using RDkit, I have tried (e.g., pattern = Chem.MolFromSmarts('[O;H0;D2]') ) [O;H0;D2] ===> gives 12 matches (all ether oxygens) [O;H0;D2;R] ===> gives 12 matches (all ether oxygens) [O;H0;D2;!R] ===> gives 0 matches [O;H0;D2;R6] ===> gives 0 matches I am stumped. Any ideas? If it is necessary to write more complicated PYTHON/RDkit/SMARTS code, I am certainly willing to try that. Thanks! Regards, Jim Metz Northwestern University -- next part -- An HTML attachment was scrubbed... -- Message: 2 Date: Wed, 6 Sep 2017 18:04:01 -0700 From: "TJ O'Donnell" <t...@acm.org> To: "James T. Metz" <jamestm...@aol.com> Cc: RDKit Discuss <RDkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens Message-ID: <CADqA_h-+KDW=wbtEdGPMvM8vz=1rzskxkkpqzo5iqh1cb33...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Try using [O;H0;D2;r6] lower-case r. Sorry I'm not at a computer to check this. R6 means in 6 rings. r6 means in ring of size 6. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html TJ O'Donnell On Wed, Sep 6, 2017 at 4:34 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Given the following SMILES for a macrocyclic hexaose > >OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO) > OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O > > can anyone suggest a SMARTS pattern that will distinguish ether oxygens > in the smaller 6-membered rings versus the ethers in the larger macrocyclic > structure? > > For example, using RDkit, I have tried (e.g., pattern = > Chem.MolFromSmarts('[O;H0;D2]') ) > > [O;H0;D2] ===> gives 12 matches (all ether oxygens) > > [O;H0;D2;R] ===> gives 12 matches (all ether oxygens) > > [O;H0;D2;!R] ===> gives 0 matches > > [O;H0;D2;R6] ===> gives 0 matches > > > I am stumped. Any ideas? > > If it is necessary to write more complicated PYTHON/RDkit/SMARTS code, > I am certainly willing to try that. > > Thanks! > > Regards, > Jim Metz > Northwestern University > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss &g
[Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens
Hello, Given the following SMILES for a macrocyclic hexaose OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO)OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O can anyone suggest a SMARTS pattern that will distinguish ether oxygens in the smaller 6-membered rings versus the ethers in the larger macrocyclic structure? For example, using RDkit, I have tried (e.g., pattern = Chem.MolFromSmarts('[O;H0;D2]') ) [O;H0;D2] ===> gives 12 matches (all ether oxygens) [O;H0;D2;R] ===> gives 12 matches (all ether oxygens) [O;H0;D2;!R] ===> gives 0 matches [O;H0;D2;R6] ===> gives 0 matches I am stumped. Any ideas? If it is necessary to write more complicated PYTHON/RDkit/SMARTS code, I am certainly willing to try that. Thanks! Regards, Jim Metz Northwestern University -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss