Hi RDKit Community,
I am experimenting with explicit bonds in SMILES. From my understanding, when I
create a mol object from a SMILES, the atom index order is preserved and
corresponds to the order from left to right in the SMILES.
I thought that this might also be the case for bond indices, but that does not
appear to be correct (see example below). Is it possible to get a bond index in
the order of the SMILES?
Thanks, Vin
smi = "CCc1cc[nH]c1CCC1CCC(CC1)c1cc[nH]c1"
mol1 = Chem.MolFromSmiles(smi)
smi_explicit = Chem.MolToSmiles(mol1, allBondsExplicit=True)
mol2 = Chem.MolFromSmiles(smi_explicit)
print(smi_explicit)
C-C-c1:c:c:[nH]:c:1-C-C-C1-C-C-C(-c2:c:c:[nH]:c:2)-C-C-1
Here is a manual labeling of bond index from left to right and marking aromatic
bond locations
C-C-c1:c:c:[nH]:c :1-C-C-C1-C -C -C( -c2 :c :c :[nH]:c :2)-C -C -1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
+ + + + + + + + + +
However, as you can see below, the actual bond index numbers using SMARTS
matching for pattern *:* is as follows:
smarts = '*:*'
atom_idx_matches = mol2.GetSubstructMatches(Chem.MolFromSmarts(smarts))
# get bond idx matches
# from
https://www.rdkit.org/docs/Cookbook.html#returning-substructure-matches-as-smiles
def get_bond_idx_matches(smarts, mol, match_atom_indices):
query_mol = Chem.MolFromSmarts(smarts)
bond_indices = []
for query_bond in query_mol.GetBonds():
atom_index1 = match_atom_indices[query_bond.GetBeginAtomIdx()]
atom_index2 = match_atom_indices[query_bond.GetEndAtomIdx()]
bond_indices.append(mol.GetBondBetweenAtoms(
atom_index1, atom_index2).GetIdx())
return bond_indices
bond_idx_matches = []
for idx_group in range(len(atom_idx_matches)):
bond_idx_matches.append(get_bond_idx_matches(smarts,mol2,atom_idx_matches[idx_group]))
print(sorted(bond_idx_matches))
[[2], [3], [4], [5], [13], [14], [15], [16], [19], [21]]
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss