Hi all, I am trying to get my head around the tautomer function. Here a few issues, could you help me getting into touch with Tim, or someone else being able to help with the tautomerization functionality?
First, the canonical tautomer function is less tautomeric than I would wish for. The following molecules are not normalized correctly, I get two different tautomeric versions. Test file and debugging log files are attached. In both cases I would expect: CC(=O)Cc1ccccc1 Second, the full enumeration is just not working. In the case of NOT defining "-c" I would expect to get all possible tautomers as defined by the Functor class class Functor : public OpenBabel::TautomerFunctor but I just get one structure all the time? Any clues what goes wrong? Third, I would highly recommend that we replace the tautomerization framework with an alternative solution, e.g. the SMIRKS ennumeriation from Markus Sitzman. The SMIRKS patterns are part of his publication Article (sin10) Sitzmann, M.; Ihlenfeldt, W.-D. & Nicklaus, M. C. Tautomerism in large databases J Comput Aided Mol Des, 2010, 24, 521-551 DOI 10.1007/s10822-010-9346-4 PMID 20512400 In other words, as defined in the SMIRKS and ranking rules, we need just a recursive execution, store the unique canonical SMILES, rank them, and take the highest scoring as tautomeric SMILES. Or we should at least put the interfaces in-place to allow users to use their tautomerization framework of choice. Thoughts? P.S.: Anyone who can take this on? Cheers /.Joerg https://plus.google.com/116731043002877336055/
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Other
7: Hybridized
8: Acceptor
9: Other
Bond Types:
0: Unassigned
1: Unassigned
2: Unassigned
3: Unassigned
4: Unassigned
5: Unassigned
6: Assigned
7: Assigned
8: Unassigned
9: Assigned
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Other
7: Hybridized
8: Unassigned
9: Other
EnumerateRecursive
Assigned 8 Acceptor
-> Rule 5: Assign 7-8 Double
-> Rule 5: Assign 5-0 Double
-> Rule 5: Assign 1-2 Double
-> Rule 4: Assign 2-3 Single
-> Rule 4: Assign 4-5 Single
-> Rule 5: Assign 3-4 Double
--> LeafNode reached...
Change?
8
A
8
A
Backtrack... 8
CC(=O)Cc1ccccc1
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Hybridized
7: Hybridized
8: Donor
9: Other
Bond Types:
0: Unassigned
1: Unassigned
2: Unassigned
3: Unassigned
4: Unassigned
5: Unassigned
6: Unassigned
7: Unassigned
8: Unassigned
9: Assigned
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Hybridized
7: Hybridized
8: Unassigned
9: Other
EnumerateRecursive
Assigned 8 Donor
-> Rule 1: Assign 7-8 Single
-> Rule 5: Assign 6-7 Double
-> Rule 4: Assign 0-6 Single
-> Rule 5: Assign 5-0 Double
-> Rule 5: Assign 1-2 Double
-> Rule 4: Assign 2-3 Single
-> Rule 4: Assign 4-5 Single
-> Rule 5: Assign 3-4 Double
--> LeafNode reached...
Change?
8
D
Change 8 to Acceptor
-> Rule 5: Assign 1-2 Double
-> Rule 5: Assign 7-8 Double
-> Rule 4: Assign 2-3 Single
-> Rule 4: Assign 6-7 Single
-> Rule 5: Assign 3-4 Double
-> Rule 5: Assign 0-6 Double
-> Rule 4: Assign 5-0 Single
-> Rule 4: Assign 4-5 Single
invalid Acceptor/Hybridized 1
8
A
Backtrack... 8
CC(=Cc1ccccc1)O
test4.sdf
Description: Binary data
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Other
7: Hybridized
8: Acceptor
9: Other
Bond Types:
0: Unassigned
1: Unassigned
2: Unassigned
3: Unassigned
4: Unassigned
5: Unassigned
6: Assigned
7: Assigned
8: Unassigned
9: Assigned
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Other
7: Hybridized
8: Unassigned
9: Other
EnumerateRecursive
Assigned 8 Acceptor
-> Rule 5: Assign 7-8 Double
-> Rule 5: Assign 5-0 Double
-> Rule 5: Assign 1-2 Double
-> Rule 4: Assign 2-3 Single
-> Rule 4: Assign 4-5 Single
-> Rule 5: Assign 3-4 Double
--> LeafNode reached...
CC(=O)Cc1ccccc1
Change?
8
A
8
A
Backtrack... 8
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Hybridized
7: Hybridized
8: Donor
9: Other
Bond Types:
0: Unassigned
1: Unassigned
2: Unassigned
3: Unassigned
4: Unassigned
5: Unassigned
6: Unassigned
7: Unassigned
8: Unassigned
9: Assigned
Atom Types:
0: Hybridized
1: Hybridized
2: Hybridized
3: Hybridized
4: Hybridized
5: Hybridized
6: Hybridized
7: Hybridized
8: Unassigned
9: Other
EnumerateRecursive
Assigned 8 Donor
-> Rule 1: Assign 7-8 Single
-> Rule 5: Assign 6-7 Double
-> Rule 4: Assign 0-6 Single
-> Rule 5: Assign 5-0 Double
-> Rule 5: Assign 1-2 Double
-> Rule 4: Assign 2-3 Single
-> Rule 4: Assign 4-5 Single
-> Rule 5: Assign 3-4 Double
--> LeafNode reached...
C/C(=C\c1ccccc1)/O
Change?
8
D
Change 8 to Acceptor
-> Rule 5: Assign 1-2 Double
-> Rule 5: Assign 7-8 Double
-> Rule 4: Assign 2-3 Single
-> Rule 4: Assign 6-7 Single
-> Rule 5: Assign 3-4 Double
-> Rule 5: Assign 0-6 Double
-> Rule 4: Assign 5-0 Single
-> Rule 4: Assign 4-5 Single
invalid Acceptor/Hybridized 1
8
A
Backtrack... 8
#Article (sin10)
#Sitzmann, M.; Ihlenfeldt, W.-D. & Nicklaus, M. C.
#Tautomerism in large databases
#J Comput Aided Mol Des, 2010, 24, 521-551
#DOI 10.1007/s10822-010-9346-4
#PMID 20512400
Rule 1: 1,3 (thio)keto/(thio)enol
[O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3]
Rule 2: 1,5 (thio)keto/(thio)enol
[O,S,Se,Te;X1:1]=[Cz1H0:2][C:5]=[C:6][CX4z0,NX3:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][Cz1:2]=[C:5][C:6]=[Cz0,N:3]
Rule 4: special imine
[Cz0R0X3:1]([C:5])=[C:2][Nz0:3][#1:4]>>[#1:4][Cz0R0X4:1]([C:5])[c:2]=[nz0:3]
Rule 5: 1,3 aromatic heteroatom H shift
[#1:4][N:1][C;e6:2]=[O,NX2:3]>>[NX2,nX2:1]=[C,c;e6:2][O,N:3][#1:4]
Rule 6: 1,3 heteroatom H shift
[N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]
Rule 7: 1,5 (aromatic) heteroatom H shift (1)
[nX2,NX2,S,O,Se,Te:1]=[C,c,nX2,NX2:6][C,c:5]=[C,c,nX2:2][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][C,c,nX2,NX2:6]=[C,c:5][C,c,nX2:2]=[NX2,S,O,Se,Te:3]
Rule 8: 1,5 aromatic heteroatom H shift (2)
[n,s,o:1]=[c,n:6][c:5]=[c,n:2][n,s,o:3][#1:4]>>[#1:4][n,s,o:1][c,n:6]=[c:5][c,n:2]=[n,s,o:3]
Rule 9: 1,7 (aromatic) heteroatom H shift
[nX2,NX2,S,O,Se,Te,Cz0X3:1]=[c,C,NX2,nX2:6][C,c:5]=[C,c,NX2,nX2:2][C,c,NX2,nX2:7]=[C,c,NX2,nX2:8][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te,Cz0X4:1][C,c,NX2,nX2:6]=[C,c:5][C,c,NX2,nX2:2]=[C,c,NX2,nX2:7][C,c,NX2,nX2:8]=[NX2,S,O,Se,Te:3][C,c,NX2,nX2:8]=[NX2,S,O,Se,Te:3]
Rule 10: 1,9 (aromatic) heteroatom H shift
[#1:1][n,N,O:2][c,nX2,C:3]=[c,nX2,C:4][c,nX2:5]=[c,nX2:6][c,nX2:7]=[c,nX2:8][c,nX2,C:9]=[n,N,O:10]>>[N,n,O:2]=[C,c,nX2:3][c,nX2:4]=[c,nX2:5][c,nX2:6]=[c,nX2:7][c,nX2:8]=[c,nX2:9][n,O:10][#1:1]
Rule 11: 1,11 (aromatic) heteroatom H shift
[#1:1][n,N,O:2][c,nX2,C:3]=[c,nX2,C:4][c,nX2:5]=[c,C,nX2:6][c,C,nX2:7]=[c,C,nX2:8][c,nX2,C:9]=[c,C,nX2:10][c,C,nX2:11]=[nX2,NX2,O:12]>>[NX2,nX2,O:2]=[C,c,nX2:3][c,C,nX2:4]=[c,C,nX2:5][c,C,nX2:6]=[c,C,nX2:7][c,C,nX2:8]=[c,C,nX2:9][c,C,nX2:10]=[c,C,nX2:11][nX2,O:12][#1:1]
Rule 12: furanones
[#1:1][O,S,N:2][c,C;z2;r5:3]=[C,c;r5:4][c,C;r5:5]>>[O,S,N:2]=[Cz2r5:3][C&r5R{0-2}:4]([#1:1])[C,c;r5:5]
Rule 13: keten/ynol exchange
[O,S,Se,Te;X1:1]=[C:2]=[C:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][C:2]#[C:3]
Rule 14: ionic nitro/aci-nitro
[#1:1][C:2][N?:3]([O:5])=[O:4]>>[C:2]=[N?:3]([O-:5])[O:4][#1:1] checkcharges
Rule 15: pentavalent nitro/aci-nitro
[#1:1][C:2][N:3](=[O:5])=[O:4]>>[C:2]=[N:3](=[O:5])[O:4][#1:1]
Rule 16: oxim/nitroso
[#1:1][O:2][Nz1:3]=[C:4]>>[O:2]=[Nz1:3][C:4][#1:1]
Rule 16: oxim/nitroso
[#1:1][O:2][Nz1:3]=[C:4]>>[O:2]=[Nz1:3][C:4][#1:1]
Rule 18: cyanic/iso-cyanic acids
[#1:1][O:2][C:3]#[N:4]>>[O:2]=[C:3]=[N:4][#1:1]
Rule 19: formamidinesulfinic acids
[#1:1][O,N:2][C:3]=[S,Se,Te:4]=[O:5]>>[O,N:2]=[C:3][S,Se,Te:4][O:5][#1:1]
Rule 20: isocyanides
[#1:1][C0:2]#[N0:3]>>[C:2]#[N?:3][#1:1] checkcharges checkaro
Rule 21: phosphonic acids
[#1:1][O:2][P:3]>>[O:2]=[P:3][#1:1]
#Scoring
Structure fragment Scoring points
Each carbocyclic aromatic ring +150
Each aromatic ring +100
Each benzoquinones (including imine and thio analogs,
[C]1([C]=[C][C]([C]=[C]1)=,:[N,S,O])=,:[N,S,O], penalize
cyclohexanetetrone-like structures)+25
Each oxim group (C=N[OH]) +4
Each double bond between a carbon atom (C) and an oxygen atom (O) +2
Each double bond between a nitrogen atom (N) and an oxygen atom (O) +2
Each double bond between a phosphorus atom (P) and an oxygen atom (O) +2
Each non-aromatic double bond between a carbon atom (C) and a heteroatom (X) +1
Each methyl group (penalize structures with terminal double bonds) +1
Each guanidine group with a double bond on the terminal nitrogen atom
(NC(=N)[N][!H]) +1
Each guanidine group with an endocyclic double bond ([N;R][C;R]([N])=[N;R]) +2
Each P-H, S-H, Se-H and Te-H bond -1
Each aci-nitro group (C=N(=O)[OH]) -4------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ OpenBabel-Devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openbabel-devel
