Hi,
I'm doing regressions for my chemfp package.
I have OpenBabel 2.2.3, 2.3.0 and today's "2.3.90" compiled from SVN, each
compiled against various versions of Python. What's below is with Python 2.6.
It looks like there's a change to the FP4 fingerprints, but I can't figure
out why.
Here's the reproducible
---> Version 2.2.3 and 2.3.0 <---
bash-3.2$ /Users/dalke/envs/py26-ob230/bin/python2.6
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import openbabel
>>> openbabel.OBReleaseVersion()
'2.3.0'
>>>
>>> import pybel
>>> reader = pybel.readfile("sdf","tests/pubchem.sdf")
>>> mol=reader.next()
>>> mol.write("smi")
'Clc1c(/C=C/C(=O)NNC(=O)Cn2nc(cc2C)C)c(F)ccc1\t9425004\n'
>>> mol.calcfp("FP4").bits
[1, 5, 88, 137, 171, 172, 180, 181, 184, 274, 275, 287, 295, 300, 301, 302, 303]
---> Version 2.3.90 (built today from SVN) <---
bash-3.2$ /Users/dalke/envs/py26-ob23svn1/bin/python2.6
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import openbabel
>>> openbabel.OBReleaseVersion()
'2.3.90'
>>> import pybel
>>> reader = pybel.readfile("sdf","tests/pubchem.sdf")
>>> mol = reader.next()
>>> mol.write("smi")
'Clc1c(/C=C/C(=O)NNC(=O)Cn2nc(cc2C)C)c(F)ccc1\t9425004\n'
>>> mol.calcfp("FP4").bits
[1, 5, 88, 137, 171, 172, 180, 181, 184, 274, 275, 287, 289, 290, 295, 300,
301, 302, 303]
You can see that the subversion build has two new bits; 289 and 290.
The SMARTS_InteLigand.txt file is unchanged between the two releases. Indeed,
there's been no SVN change for many years.
The canonical SMILES are identical, so it's unlikely to be an aromaticity
perception issue.
I believe pybel's bit start counting from 1 since if I extract the SMARTS
definitions from the SMARTS file I see that the first one (which I've labeled
'1') is:
1 Primary_carbon: [CX4H3][#6]
This means that 289 and 290 deal with the "D" pattern and deal with '/' and '\'
bonds.
288 Conjugated_tripple_bond: *#*[*]=,#,:[*]
289 Cis_double_bond: */[D2]=[D2]\*
290 Trans_double_bond: */[D2]=[D2]/*
291 Mixed_anhydrides:
[$(*=O),$([#16,#14,#5]),$([#7]([#6]=[OX1]))][#8X2][$(*=O),$([#16,#14,#5]),$([#7]([#6]=[OX1]))]
Has the SMARTS pattern matcher for either of those changed? (I'm betting
stereo.)
Andrew
[email protected]
Here's the original structure file
===================================
9425004
-OEChem-01150805002D
40 41 0 0 0 0 0 0 0999 V2000
2.0000 -3.0580 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
5.4641 -3.0580 0.0000 F 0 0 0 0 0 0 0 0 0 0 0 0
7.1962 0.9420 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -0.0580 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
7.1962 2.9420 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
7.3007 3.9365 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
5.4641 0.9420 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
5.4641 -0.0580 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
8.1097 2.5353 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
6.3301 2.4420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
8.7788 3.2784 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
8.2788 4.1444 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
8.3176 1.5571 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
6.3301 1.4420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
8.6856 5.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -3.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -2.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5981 -0.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -3.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5981 -3.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5981 -1.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.8660 -4.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5981 -4.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -5.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
6.1181 3.0246 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
5.7195 2.3343 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
9.3954 3.2136 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
7.7112 1.4282 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
8.4465 0.9507 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
8.9241 1.6860 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
8.1192 5.3102 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
8.9377 5.6244 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
9.2520 4.8058 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
4.9272 1.2520 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
6.0010 -0.3680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
3.1951 -1.7480 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
5.1350 -1.8680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2.3291 -4.8680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
5.1350 -4.8680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
3.7321 -5.6780 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 19 1 0 0 0 0
2 20 1 0 0 0 0
3 14 2 0 0 0 0
4 18 2 0 0 0 0
5 6 1 0 0 0 0
5 9 1 0 0 0 0
5 10 1 0 0 0 0
6 12 2 0 0 0 0
7 8 1 0 0 0 0
7 14 1 0 0 0 0
7 34 1 0 0 0 0
8 18 1 0 0 0 0
8 35 1 0 0 0 0
9 11 2 0 0 0 0
9 13 1 0 0 0 0
10 14 1 0 0 0 0
10 25 1 0 0 0 0
10 26 1 0 0 0 0
11 12 1 0 0 0 0
11 27 1 0 0 0 0
12 15 1 0 0 0 0
13 28 1 0 0 0 0
13 29 1 0 0 0 0
13 30 1 0 0 0 0
15 31 1 0 0 0 0
15 32 1 0 0 0 0
15 33 1 0 0 0 0
16 17 1 0 0 0 0
16 19 1 0 0 0 0
16 20 2 0 0 0 0
17 21 2 0 0 0 0
17 36 1 0 0 0 0
18 21 1 0 0 0 0
19 22 2 0 0 0 0
20 23 1 0 0 0 0
21 37 1 0 0 0 0
22 24 1 0 0 0 0
22 38 1 0 0 0 0
23 24 2 0 0 0 0
23 39 1 0 0 0 0
24 40 1 0 0 0 0
M END
> <PUBCHEM_COMPOUND_CID>
9425004
$$$$
===================================
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
OpenBabel-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss