Hi,

Some notes from Geoff regarding the topic:

I'm not sure. We save the constraints before doing anything:
> https://github.com/openbabel/openbabel/blob/master/src/distgeom.cpp#L176
>
> The original code for CheckStereoConstraints was here:
> https://github.com/openbabel/openbabel/blob/master/src/distgeom.cpp#L912
>
> Then we check if everything is okay before declaring success:
> https://github.com/openbabel/openbabel/blob/master/src/distgeom.cpp#L1173
>
> In short, there should be no way to exit that loop unless
> CheckStereoConstraints returns true. And yet, we'd have cases where the
> roundtrip to SMILES generates incorrect stereo.
>
> If it's something that interests you, I can see if Naruki can provide more
> detail on the current state - he implemented RDKit's 4-dimensional
> minimization, which helped.. but didn't solve the problem.
>
> I can think of a few possible issues because i don't fully understand the
> stereo code;
> - We're saving the stereo incorrectly - and the original version gets
> corrupted (e.g., something's a pointer when it should have been a deep copy)
> - I'm mis-understanding the stereo code - which was true at first because
> I was using atom index vs. id
> - The conversion from 3D geometries to stereo is somehow not right - maybe
> due to lazy perception issues
>
> -Geoff


There seem to be several issues here.

1)  Handling of implicit hydrogens

Implicit hydrogens are not handled correctly. A partial workaround is
implemented, but this should be extended for the view from/towards atom.
This prevents segfaults.

https://github.com/openbabel/openbabel/blob/master/src/distgeom.cpp#L188

2) Handling of unspecified stereochemistry in CheckStereoConstraints

The call to StereoFrom3D will delete all previously stored stereochemistry
objects. This means that unspecified stereochemistry will become specified.
For example, input molecules CC(I)F or CC=CC will always fail this test.
This will result in many unnecessary iterations. I already have a patch for
this.

3) Handling of unpsecified stereochemistry in SMILES-SDF roundtrip test

Using the SDF file format for the rountrip tests is also problematic.
Although we correctly write out unspecified atom stereo parities, these are
ignored (by default) when reading an SDF file. We have an option (-as) to
use these and ignore the 3D geometry. However, ideally we would also have
an option to use 3D geometry to determine the stereochemistry when
specified (atom parity != 3) and use atom parity 3 to mark the
stereochemistry unspecified (i.e. a hybrid reading mode).

*Does this sound reasonable? If so, I'll implement this. This is currently
holding me back from being able to test on larger datasets.*

4) Wrong stereochemistry when using --gen3D

These are cases that pass the OBDistanceGeometry::CheckStereoConstraints
test, but still fail the roundtrip (w/o unspecified stereochemistry).

RDKit has some additional checks here which I implemented (_volumeTest and
_centerInVolume) and these catch some issues. However, it didn't fix the
problem.

After doing some more testing, it seems the problem is not really in the
distance geometry code itself. That is, "valid" structures are returned
which become invalid after energy minimization later in the gen3d
operation. Adding an iteration loop to gen3d with a check similar to the
one in OBDistanceGeometry is able to catch these cases.

I currently have a patch for this which makes a copy of the molecule. This
could be made more efficient but in the short term, it would already be an
improvement.


In any case, finding a solution to 3 will help to robustly test a more
efficient version of this fix. Also, the CheckStereoContraints could be
made more efficient, but I'd like to focus on making it work correctly
first.

Tim
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to