Hi All,
Reading molecules from a bulk download of SureChEMBL, I come across a fair
few molecules that fail to parse. Not sure whether they SHOULD parse or
not.

Here is an example: https://www.surechembl.org/chemical/SCHEMBL386
with SMILES code: COC(=O)C1=C(C=CC=C1)C1=CC=C(C[N+]#[N]=[N-])C=C1

Even reading the SMILES code one can see that there are too many bonds in
there - a nitrogen triply bonded and doubly bonded to other atoms.

Another example: https://www.surechembl.org/chemical/SCHEMBL33957
smiles: NC(N)=[NH]C1=NC(CSCC[NH]=CNS(=O)(=O)C2=CC=C(Br)C=C2)=CS1

Again, valence for a nitrogen is off.

Should I expect to parse these with RDKit? Might there be some way around
this? It's a significant fraction of the molecules in SureChEMBL.

Thanks team!
Lewis
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to