On Mar 31, 2021, at 21:55, Ling Chan <lingtrek...@gmail.com> wrote: > I am trying to do something that I think is quite simple, but I have not > figured out a simple way. Don't know if I am missing something. I am sure > that ultimately I can figure it out, but I wonder if there is a good way.
If you can work in SMILES space rather than molecule space, then try: http://dalkescientific.com/smiles_weld.py It's derived from a technique I developed for the mmpdb package. I called it 'welding' the SMILES strings. What I do is convert the wildcards into closures, then let RDKit merge the closures. (There are a few tricky parts, like support for double-bond stereo chemistry.) Here's an example, where I use a dictionary to tell the program that [1*] should be bonded to [2*]. >>> from rdkit import Chem >>> smi = "N#Cc1ccncc1" >>> mol = Chem.MolFromSmiles(smi) >>> frag_mol = Chem.FragmentOnBonds(mol, [1]) >>> frag_smi = Chem.MolToSmiles(frag_mol) >>> frag_smi '[1*]c1ccncc1.[2*]C#N' >>> import smiles_weld >>> smiles_weld.convert_wildcards_to_closures(frag_smi, {1: 1, 2: 1}) 'c%991ccncc1.C%99#N' >>> Chem.CanonSmiles('c%991ccncc1.C%99#N') 'N#Cc1ccncc1' If you use matching dummy labels then you can omit the conversion table: >>> frag_mol = Chem.FragmentOnBonds(mol, [1], dummyLabels=((4,4),)) >>> frag_smi = Chem.MolToSmiles(frag_mol) >>> frag_smi '[4*]C#N.[4*]c1ccncc1' >>> smiles_weld.convert_wildcards_to_closures(frag_smi) 'C%99#N.c%991ccncc1' >>> Chem.CanonSmiles('C%99#N.c%991ccncc1') 'N#Cc1ccncc1' Note: while the mmpdb code is well-tested, I modified it this morning to handle what I think you want, and I haven't fully tested the new code. The program assumes the SMILES is a canonical SMILES generated by RDKit, and that the wildcard labels don't have a charge, hydrogen count, or other attribute. Cheers, Andrew da...@dalkescientific.com _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss