Hi all,

I'm trying to use CDK for detecting pharmacophores. It's easy enough
to get SMARTSQuery to find patterns but a major problem is, that the
returned atom (indices) do not respect the ordering of molecules in
the original SMARTS string. For example, the query string "[O;X2]C"
will, in one of my examples, return two atoms in this order:

#<Atom Atom(745381933, S:C, 3D:[(-4.8833, 0.8605, 0.0308)],
AtomType(745381933, N:C.sp3, FC:0, H:SP3, NC:4, EV:4,
Isotope(745381933, Element(745381933, S:C, ID:C1))))>
#<Atom Atom(1362034980, S:O, 3D:[(-3.6751, 1.6658, 0.029)],
AtomType(1362034980, N:O.sp3, FC:0, H:SP3, NC:2, EV:2,
Isotope(1362034980, Element(1362034980, S:O, ID:O1))))>

That is, the carbon is returned before the oxygen. I'm only interested
in the oxygen as that is the defining point of this feature but I
cannot assume that it will be the first atom returned (or the last,
for that matter). The order in which SMARTSQuery returns matched atoms
is arbitrary, rendering it useless as a feature extraction tool.

I have been looking trough the source code of SMARTQuery, SMARTSParser
and UniversalIsomorphismTester to no avail. As far as I can tell from
the source it is impossible to extract the atoms in the correct order
without completely rewriting the SMARTS matching functionality include
the isomorphism tester.

Does anybody know if I am mistaken? It would really make life much
easier for me if it was possible to extract the atoms in the order in
which they are matched.

Best regards,

Thomas

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to