Hi Pat, I wrote something like this for mmpdb, which is the MMPA code I helped develop, at https://github.com/rdkit/mmpdb .
It has one restriction, which I'll get to in a moment. The general idea is to convert the attachment points to closures, join them with a ".", and canonicalize: >>> from mmpdblib import smiles_syntax >>> s1 = >>> smiles_syntax.convert_labeled_wildcards_to_closures("CN(C)CC(Br)c1cc([*:2])c([*:1])cn1") >>> s1 'CN(C)CC(Br)c1cc%92c%91cn1' >>> s2 = >>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([*:1])([H])[H].[H][*:2]") >>> s2 '[H]C%91([H])[H].[H]%92' >>> from rdkit import Chem >>> Chem.CanonSmiles(s1+"."+s2) 'Cc1ccc(C(Br)CN(C)C)nc1' The smiles_syntax.py file does not use any of the rest of the code. The restriction is that the code as-is assumes the wild card atoms like [*:1] are either immediately before or after the attachment point. Otherwise it will give you (using the R-groups you actually posted): >>> s2 = >>> smiles_syntax.convert_labeled_wildcards_to_closures("[H]C([H])([H])[*:1].[H][*:2]") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 130, in convert_labeled_wildcards_to_closures return convert_wildcards_to_closures(new_smiles, offsets) File "/Users/dalke/cvses/mmpdb/mmpdblib/smiles_syntax.py", line 98, in convert_wildcards_to_closures new_smiles, new_smiles[wildcard_start-1:]) NotImplementedError: ('intermediate groups not supported', '[H]C([H])([H])[*].[H][*]', ')[*].[H][*]') All this means is I didn't write the code to count the number of intermediate branches/matched parentheses between the attachment point a and the wildcard atom. ("Count" because I would need to invert any chirality on the base atom if there were an odd number of intermediate groups.) Such code wouldn't be hard to add. It's not there because my experience is that RDKit only placed the "*" atoms in one of those two locations. However, as I just learned, if you leave the hydrogens in then the [H] atoms have priority: >>> Chem.SanitizeMol(mol,Chem.SANITIZE_ALL^Chem.SANITIZE_CLEANUP^Chem.SANITIZE_PROPERTIES) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE >>> Chem.MolToSmiles(mol) '[H]C([H])([H])[*:1]' Then again, explicit [H] atoms aren't important for your end goal, so you could just recanonicalize all of your R-groups first, to ensure they are in the RDKit form, then use the SMILES rewriter. For what it's worth, I coined the term "welding" to describe this technique of converting the labeled R-groups into ring-closures, then "." (dis)connected them to parse them as a single odd-looking SMILES. Andrew da...@dalkescientific.com > On Apr 15, 2018, at 21:16, Patrick Walters <wpwalt...@gmail.com> wrote: > > Hi All, > > I was about to write a function to reassemble a molecule from a core + > R-groups, but I thought I'd check and see if such a function already exists. > This is work with the output of rdRGroupDecomposition > > Gvien a core: > CN(C)CC(Br)c1cc([*:2])c([*:1])cn1 > > Plus a set of R-groups > [H]C([H])([H])[*:1] > [H][*:2] > > Reconnect the pieces to generate a molecule > CN(C)CC(Br)c1ccc(C)cn1 > > Thanks, > > Pat ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss