My MCS code finds a subset of the atoms and bonds
in each input structure.
This is a connected fragment. It could be as simple as two
aromatic atoms bonded by an aromatic bond.
I would like to save the fragment to an SD file. This seems
reasonable on the surface. But I don't know enough about the SD
file to know how meaningful that is. (For example, the above as
a 'fragment SMILES' is 'cc', which some programs accept and
many don't.)
RDKit frowns on me trying to do that.
>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c1ccccc1")
>>> emol = Chem.EditableMol(Chem.Mol())
>>> bond = list(mol.GetBonds())[0]
>>> atoms = [bond.GetBeginAtom(), bond.GetEndAtom()]
>>> a1 = emol.AddAtom(atoms[0])
>>> a2 = emol.AddAtom(atoms[1])
>>> a1, a2
(0, 1)
>>> emol.AddBond(a1, a2, bond.GetBondType())
1
>>> mol2 = emol.GetMol()
>>> Chem.MolToMolBlock(mol2)
[02:47:11] non-ring atom 0 marked aromatic
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Sanitization error: non-ring atom 0 marked aromatic
>>>
What I did to support this (again, I don't know if it's
really legitimate, according to the SD file conventions)
is to use "kekulize=False"
>>> print Chem.MolToMolBlock(mol2, kekulize=False)
RDKit
2 1 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 4 0
M END
It this reasonable for what I'm doing?
2) I seem to have a problems with the code which extracts
the fragment from a molecule. Here's a reproducible, where
I think I turn carbon monoxide into a copy of itself, but
the copy can't be used to make a molecule block.
from rdkit import Chem
def subgraph_to_fragment(mol, atom_indices, bond_indices):
emol = Chem.EditableMol(Chem.Mol())
atom_map = {}
for atom_index in atom_indices:
emol.AddAtom(mol.GetAtomWithIdx(atom_index))
atom_map[atom_index] = len(atom_map)
for bond_index in bond_indices:
bond = mol.GetBondWithIdx(bond_index)
emol.AddBond(atom_map[bond.GetBeginAtomIdx()],
atom_map[bond.GetEndAtomIdx()],
bond.GetBondType())
return emol.GetMol()
mol = Chem.MolFromSmiles("O=C")
bond_indices = [0]
## bond = mol.GetBondWithIdx(0)
## atom_indices = [bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()]
atom_indices = [0, 1]
fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
print "====Original\n", Chem.MolToMolBlock(mol, kekulize=False)
print "SMILES", Chem.MolToSmiles(fragment)
print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)
However, that gives the output and exception:
====Original
RDKit
2 1 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0
M END
SMILES C=O
====Fragment
[03:12:30]
****
Pre-condition Violation
not initialized
Violation occurred on line 67 in file
/tmp/homebrew-rdkit-HEAD-dB4i/Code/GraphMol/RingInfo.cpp
Failed Expression: df_init
****
Traceback (most recent call last):
File "fail.py", line 26, in <module>
print "====Fragment\n", Chem.MolToMolBlock(fragment, kekulize=False)
RuntimeError: Pre-condition Violation
What am I doing wrong here?
I know from private discussion with Greg on another topic that
a fix to this is either Chem.GetSSSR(fragment) or
Chem.FastFindRings(fragment), as in, for example:
fragment = subgraph_to_fragment(mol, atom_indices, bond_indices)
Chem.FastFindRings(fragment)
That seems to work for me. Why is it needed? And why *isn't*
it needed for the SMILES output?
When building a molecule from scratch, I'm supposed to call
sanitizeMol on the result in order to perceive the chemistry.
However, I don't want to do that on a fragment for fear that
it will modify chemistry under the assumption that it's a
full molecule.
The only other previous match on Google to "Failed Expression: df_init"
is a bug report from JP, where the same exception is raised
after doing ReplaceSubstructs. Greg wrote:
Marking this as "Won't Fix", because the current behavior is
what's supposed to happen: it is expected that the caller
will take care of sanitizing the results of ReplaceSubstructs
if they need them sanitized. The chemical reaction code behaves
similarly.
This seems like a rough edge that's caught a few people
and should be smoothed.
Andrew
[email protected]
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss