Hi John!
thanks for the research on this. This would have taken a lot of time for me to
find this out...
so [C]1[C][C][C][C]1 is perceived as aromatic... this is in accordance with the
different behavior I see when I run the same code with six-rings instead of
five-rings. For 6-rings, there's no problem, presumably because it's not
perceived as aromatic.
So what I do is first clone the original atomcontainer (to prevent it from
updating the implicit H-count), and then run the atom typing and adding
hydrogens on each of the IRings.
IRingSet ringSet = new SSSRFinder(m.clone()).findSSSR();//find SSSR rings
for(IAtomContainer ring : ringSet.atomContainers()){
AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(ring);
CDKHydrogenAdder.getInstance(blr).addImplicitHydrogens(ring);
boolean found = sqt.matches(ring);//true
}
regards,
Nick
________________________________________
From: John May [[email protected]]
Sent: Friday, February 07, 2014 6:48 PM
To: Nick Vandewiele
Cc: [email protected]
Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen
conversion and SSSRing finder
Okay now I’ve actually tracked it down - the issue is to do with aromaticity
(kind of) and the SSSR providing a container for the ring atoms/bonds.
With implicit hydrogens the substructure from the SSSRFinder looks like this…
[CH]1CCCC1
Note that C1(O)CCCC1 is really [CH]1([OH])[CH2][CH2][CH2][CH2]1. In the CDK
removing atoms doesn’t update neighbour hydrogen counts hence the first carbon
keeps an implicit hydrogen count of 1.
When all hydrogens are explicit we get
[C]1[C][C][C][C]1
For some reason the aromaticity algorithm finds it to be aromatic. I can fix
that but for now you can update the valences (i.e. AtomType/AddHydrogens) - but
consider this. The atoms in the IRing are the same as the molecule - so
adjusting the hydrogen count for the ring atoms would also affect the parent
molecule. You can even run it and you’ll get..
[CH2]1([CH2](O[H])([CH2]([CH2]([CH2]1([H])[H])([H])[H])([H])[H])[H])([H])[H]
It would be even worse when there are multiple rings. I’ve never liked IRing
anyway - much better to refer to rings by index without creating a new
container.
Cheers,
J
On 7 Feb 2014, at 17:12, John May <[email protected]<mailto:[email protected]>>
wrote:
Doh - of course. So the SMARTS has a quirk that ‘C1’ matches the
‘CDKConstants.ISINRING’ flag.
We can fix this without a patch - just add this before you match. The
SMARTSQueryTool should be doing it already - not sure why it isn’t though….
(that’s the bug)
SmartsMatchers.prepare(ring, true);
On 7 Feb 2014, at 17:03, John May <[email protected]<mailto:[email protected]>>
wrote:
No problem,
master, but nothing should have changed…
J
On 7 Feb 2014, at 16:47, Nick Vandewiele
<[email protected]<mailto:[email protected]>> wrote:
John,
Thanks for the fast response!
However: adding or removing dashes in the SMARTS string doesn’t change the
outcome when I try it.
Also, using your proposed alternative, eg:
Pattern pattern = Ullmann.findSubstructure(SMARTSParser.parse(smarts, blr));
for (IAtomContainer ring : ringSet.atomContainers()) {
System.out.println(pattern.matches(ring));
}
Does not change the outcome (ie false) for me neither.
Are you using the 1.5.4 or master branch?
Regards,
Nick
From: John May [mailto:[email protected]]
Sent: Friday, February 07, 2014 5:28 PM
To: Nick Vandewiele
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [Cdk-user] SMARTS matching after implicit to explict hydrogen
conversion and SSSRing finder
Okay it’s the bond matching… C-1-C-C-C-C1 works but C1-C-C-C-C1 doesn’t.
Should be an easy fix.
J
On 7 Feb 2014, at 16:03, Nick Vandewiele
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I am using CDK 1.5.4 and detected some behavior of the SMARTS matcher that I
didn’t quite understand.
When I search for a SMARTS pattern in one of the rings detected using the
SSSRFinder algorithm, the success of finding the pattern in the ring depends on
whether implicit hydrogens were converted to explicit ones, or not.
If explicit hydrogens are present, the pattern is not found. If only implicit
hydrogens are present, the pattern IS found.
This code was used:
String smiles = "C1C(O)CCC1";
IChemObjectBuilder blr =
SilentChemObjectBuilder.getInstance();
SmilesParser smipar = new SmilesParser(blr);
IAtomContainer m = smipar.parseSmiles(smiles);
String smarts = "C1-C-C-C-C1";
SMARTSQueryTool sqt = new SMARTSQueryTool(smarts, blr);
AtomContainerManipulator.convertImplicitToExplicitHydrogens(m);
IRingSet ringSet = new SSSRFinder(m).findSSSR();//find SSSR rings
for(IAtomContainer ring : ringSet.atomContainers()){
boolean found = sqt.matches(ring);//false (should be true)
}
Although the release notes of 1.5.4 are very informative, I couldn’t find an
answer explaining this behavior.
So my question is two-fold:
1) how do I ensure that the pattern is found, even when explicit hydrogens
are used in the atomcontainer?
2) What is happening underneath the hood here? Is this behavior normal?
Regards,
Nick
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________
Cdk-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/cdk-user
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________
Cdk-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/cdk-user
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user