Hi RDKit Community, I am experimenting with explicit bonds in SMILES. From my understanding, when I create a mol object from a SMILES, the atom index order is preserved and corresponds to the order from left to right in the SMILES.
I thought that this might also be the case for bond indices, but that does not appear to be correct (see example below). Is it possible to get a bond index in the order of the SMILES? Thanks, Vin smi = "CCc1cc[nH]c1CCC1CCC(CC1)c1cc[nH]c1" mol1 = Chem.MolFromSmiles(smi) smi_explicit = Chem.MolToSmiles(mol1, allBondsExplicit=True) mol2 = Chem.MolFromSmiles(smi_explicit) print(smi_explicit) C-C-c1:c:c:[nH]:c:1-C-C-C1-C-C-C(-c2:c:c:[nH]:c:2)-C-C-1 Here is a manual labeling of bond index from left to right and marking aromatic bond locations C-C-c1:c:c:[nH]:c :1-C-C-C1-C -C -C( -c2 :c :c :[nH]:c :2)-C -C -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 + + + + + + + + + + However, as you can see below, the actual bond index numbers using SMARTS matching for pattern *:* is as follows: smarts = '*:*' atom_idx_matches = mol2.GetSubstructMatches(Chem.MolFromSmarts(smarts)) # get bond idx matches # from https://www.rdkit.org/docs/Cookbook.html#returning-substructure-matches-as-smiles def get_bond_idx_matches(smarts, mol, match_atom_indices): query_mol = Chem.MolFromSmarts(smarts) bond_indices = [] for query_bond in query_mol.GetBonds(): atom_index1 = match_atom_indices[query_bond.GetBeginAtomIdx()] atom_index2 = match_atom_indices[query_bond.GetEndAtomIdx()] bond_indices.append(mol.GetBondBetweenAtoms( atom_index1, atom_index2).GetIdx()) return bond_indices bond_idx_matches = [] for idx_group in range(len(atom_idx_matches)): bond_idx_matches.append(get_bond_idx_matches(smarts,mol2,atom_idx_matches[idx_group])) print(sorted(bond_idx_matches)) [[2], [3], [4], [5], [13], [14], [15], [16], [19], [21]]
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss