The issue is only regarding cis/trans stereo so I'll confine my comments to
that. (We do store tet stereo twice, as well as provide an option to read
the chirality flag if desired.)

You assume that there is a convention for storing cis/trans stereo in the
bond block, but that OB is not following it. There is no such convention;
neither for 3D, 2D or 0D. For cis/trans stereo, these bonds are only used
to indicate unknown dbl bond stereo (via two separate conventions, either
on the double bond itself or on an attached single bond).

Now...we can implement a way to do this, using two Up bonds (basically how
SMILES are stored) for example, and that's what I had done earlier. But
Open Babel will be the only software that will read this correctly as it's
not even hinted at in the spec. Other software will regard the cis/trans
stereo as undefined.

So...which is the lesser evil? For OB to implement a self-consistent
cis/trans stereo representation for 0D SDF which is unrecognised by other
software (thus causing silent loss of information), or for OB to not
support storing cis/trans stereo for 0D SDF (which currently causes a
warning about loss of stereo), a behaviour which is consistent with other
software.

It's not a rhetorical question - I'm happy to be convinced either way
(especially as the code is already written), as until now, there's been
no-one interested in discussing this.

- Noel

On 14 May 2012 02:05, Craig James <cja...@emolecules.com> wrote:

> Hi Noel,
>
> Thanks for the pointer to your blog post ... it explains the issue well.
> I'll address the topic here, but let me know if it would be better to post
> on your blog for completeness or on the OB list for wider distribution.
>
> My overall answer to this whole question is that it's always a mistake to
> lose information -- particularly in a toolkit like OpenBabel.  The primary
> *raison d'etre* of OpenBabel is to communicate between different file
> formats with the greatest fidelity possible.  With this change, we have a
> situation where a round-trip between two formats loses critical molecular
> information where previously it didn't.
>
> I see it as more of a pragmatic question than anything else.  There is a
> way to keep the information, so why not do it?
>
> The origin of this problem is the age-old complaint that the SD File
> Format has both ambiguity and redundancy.  Each developer interprets the
> spec differently and chaos results. My philosophy has always been to err on
> the side of too much information rather than just enough or too little.
> When a stereo center is present, mark it every way possible.  When a
> cis/trans bond is present, use both the 2D coordinates and the bond labels.
>
> From your blog:
> > My current understanding is that where 3D coordinates are present,
> there's no need
> > to store stereochemical information in either the atom parity or the
> bond block. I think
> > I'll probably set the atom parity anyway (since I've already written the
> code, and it
> > helps when you look at the file to be able to easily identify the chiral
> centers).
>
> There are three reasons why you should store stereo information everywhere.
>
> First, because there's no reason not to (what's the harm?).
>
> Second, it's often used to designate partially-known stereochemistry.
> It's common for a molecule to have both known and unknown stereo centers.
> SMILES handles this because each stereo center is specified independently.
> People often will generate 3D coordinates for a molecule even though they
> don't know each stereo center -- they just arbitrarily pick a configuration
> for the unknown centers.  By marking some centers' parity bits or up/down
> bonds and leaving others out, you can make it clear that the
> stereochemistry is partially known.  (It would be nice if this were written
> into the CTFile specification.)
>
> And third, there are applications out there that rely on the atom parity
> and bond blocks to specify chirality.  It's a bit of work to do the
> geometry to deduce stereochemistry from 3D coordinates, so many apps just
> count on the atom-parity bit or bond block.  My recollection is that
> Daylight's SDF-to-SMILES conversion programs used the atom parity and bond
> up/down flags if they could, and only used the 3D geometry as a last resort.
>
> > For 2D coordinates, there's no need to store the bond stereochemistry
> (as this can
> > be worked out from the coordinates), but chirality needs to be stored
> explicitly. The
> > normal way to store this is not using atom parity (but I'll set this
> anyway for the same
> > reasons as above), but by setting one of the bonds on the tetrahedral
> center to up or down.
>
> This is true in theory but useless in practice.  The first argument above
> ("what's the harm?") applies here too.  But more importantly, most molecule
> editors and 2D generators (including OpenBabel!) will use 120-degree bonds
> on every double bond they draw or lay out.  And in almost all cases, by
> default they draw the trans configuration.  In real life, often time a
> chemist will draw a double bond in the trans configuration without actually
> knowing (or caring) whether it's cis or trans.
>
> And like the 3D information, it's often the case that one double-bond's
> configuration is known while another's is not.  If you assume that you can
> derive the cis/trans configuration from the 2D coordinates, then there's no
> way to represent the information in "CC=CC/C=CC/".  On the other hand, by
> using the up/down bond flags, you can represent this molecule correctly.
>
> > For 0D coordinates, there are no guidelines. I propose to store
> cis/trans stereo
> > using the bond stereo (you know, UP [or DOWN] at both ends of a double
> bond
> > means cis),
>
> But right now OpenBabel isn't even doing this.  It's just discarding the
> cis/trans information.
>
> > and chirality using the atom parity. The MDL spec states that atom
> > parity should be ignored when read, but the alternative is to just
> forget the
> > stereochemistry, or else to store both cis/trans stereo *and* chirality
> in the bond
> > block, which may just about be possible but is likely to be a real mess.
>
> Here again, I'd argue for putting the information everywhere possible for
> reasons of portability. The CTFile spec, combined with various heroic
> attempts to work around its shortcomings, means that for every possible
> choice of how to write the chirality there's at least one app that does it
> that way.  If OpenBabel can write correct SD Files that put redundant but
> consistent chiral specifications (i.e. use 3D, atom parity and bond flags),
> then why not?
>
> Here's a more pragmatic argument.  In OB 2.3.1, they only way to get a
> correct round-trip SMILES-SDF-SMILES generation is to use --gen2D.  That
> requires a very expensive and unnecessary *ab initio* calculation of 2D
> coordinates.  For many real molecules, generating 2D coordinates can be 10x
> or 100x slower than merely parsing the molecule ... and it was completely
> unnecessary in OB 2.2.x.
>
> And more to the point, this is a showstopper for us.  In our experience,
> most pharmaceutical researchers use SMILES for molecular modeling,
> diversity analysis, toxicology analysis and so forth. Once they decide what
> to buy, they may send us the SMILES, or may send us SD Files. These files
> can range from a few compounds to hundreds of thousands of compounds.  It
> would be a disaster if the cis/trans information was lost at the end of
> this time-consuming analysis just because they (or we) converted their
> SMILES to SDF format using OpenBabel before buying the compounds.
>
> Since I know about this problem, eMolecules can exercise diligence and
> never do a SMILES-to-SDF conversion.  But customers might not be aware of
> this restriction -- they use OpenBabel because it is known to be good at
> file-format conversion.  It would be really unpleasant for us to have to
> explain to a customer that they'd ordered hundreds of incorrect compounds
> because OpenBabel doesn't handle cis/trans the way you'd expect.
>
> Thanks,
> Craig
>
>
>
> On Sat, May 12, 2012 at 6:37 AM, Noel O'Boyle <baoille...@gmail.com>wrote:
>
>> Sorry - I got things backward. It's storing the cis/trans stereochemistry
>> in a 0D format that's the problem. See the post and comments at
>> http://baoilleach.blogspot.com/2010/02/how-to-store-stereochemistry-in-mol.html
>>
>> - Noel
>>
>> On 12 May 2012 13:52, Noel O'Boyle <baoille...@gmail.com> wrote:
>>
>>> It's intentional, rather than a bug. I originally had some code in there
>>> to support stereo in 0D SDF, but the format really doesn't support this
>>> officially - it's supposed to be either 2D or 3D. It's all very well for
>>> cis/trans, but it's not possible to store tet stereo without coordinates
>>> (which aren't present in 0D) or tet parities (which the spec explicitly
>>> says to ignore on reading).
>>>
>>> In short, we could support this, but Open Babel would be the only
>>> software to do so, and these 0D SDF files would not be handled correctly by
>>> others...
>>>
>>> In short, if you use --gen2d or --gen3d it will work fine.
>>>
>>> - Noel
>>>
>>> On 11 May 2012 23:48, Craig James <cja...@emolecules.com> wrote:
>>>
>>>>  This looks bad:
>>>>
>>>>    echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
>>>>    CC=CC
>>>>
>>>> Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:
>>>>
>>>>    echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
>>>>     C/C=C/C
>>>>
>>>> The problem seems to be here in 2.3.x:
>>>>
>>>>    echo "C/C=C/C" | babel -i smi -o sdf
>>>>
>>>>     OpenBabel05111215342D
>>>>
>>>>      4  3  0  0  0  0  0  0  0  0999 V2000
>>>>        0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0
>>>>  0
>>>>        0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0
>>>>  0
>>>>      1  2  1  0  0  0  0
>>>>       2  3  2  0  0  0  0
>>>>      3  4  1  0  0  0  0
>>>>     M  END
>>>>    $$$$
>>>>
>>>> Notice that the bond block has no stereo (cis/trans) markings.  Do the
>>>> same thing in 2.2.x and the cis/trans bonds are properly marked:
>>>>
>>>>     echo "C/C=C/C" | babel -i smi -o sdf
>>>>
>>>>      OpenBabel05111215352D
>>>>
>>>>       4  3  0  0  0  0  0  0  0  0999 V2000
>>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>>         0.0000    0.0000    0.0000 C   0  0  0  0  0
>>>>       1  2  1  1  0  0
>>>>       2  3  2  3  0  0
>>>>       3  4  1  6  0  0
>>>>     M  END
>>>>
>>>>     $$$$
>>>>
>>>> The bond block is correct here in this output from 2.2.x.
>>>>
>>>> Any ideas when this might have happened and if it was intentional?
>>>>
>>>> Thanks,
>>>> Craig
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> OpenBabel-Devel mailing list
>>>> OpenBabel-Devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to