Re: [Cdk-user] aromatic bonds depicted as any bonds

Tim Dudgeon Wed, 03 Jan 2018 07:58:57 -0800

John,

Thanks for the details response. I think it will be useful to be able todepict aromatic bonds, and, as you mention, the main proper use for thiswill be for query structures and fragments. However, many structures outthere in the wild do use aromatic bonds, so I think its useful to haveit for normal structures too.

Tim

p.s. when I referred to the dotted bond as 'ANY' bond, this is thenotation that ChemAxon uses to depict this type of query bond. But Iguess that's not an absolute standard.



On 03/01/18 14:03, John Mayfield wrote:

I'll answer these back to front as the second one is much simpler toanswer:


    1. Why the inconsistency in how the different parsers/readers
    behave? Is this documented anywhere?
    2. Is it possible to have the aromatic bonds depicted as proper
    aromatic bonds?

Answer 2: I don't think the dohnuts are useful for plain oldstructures, only query structures. The circles also do not scale wellto all cases (porphyrin is a classic). The dashed bond in thedepiction is not really 'any' bond as you say but rather "you inputwas junk/had missing information" (see the next Answer on why thatis). Since as I said for query structures you need the 'delocalised'bond depiction i've updated the renderer accordingly. For now I'vejust done an offset dash but will try and find time to add in thedohuts: https://github.com/cdk/cdk/pull/403

Answer 1: The short answer is CDK matches behaviour to what Daylightdoes for SMILES and what MDL/Symyx/Accelrys/BIOVIA do for molfile. Youcan safely use aromatic bond types in SMILES and not in CTfiles.

In CDK aromaticity is a bond property and not a type/order, that is tosay the bond order is independent of the aromatic status of the bond.The "normal form" of a molecule in the CDK is to have all the hydrogencounts and bond orders set - if this not so you will getwarnings/exceptions all over the place. A molecule can be in aninconsistent state if an input format was invalid or you create itthat way manually. As I'm sure you know, bond type = 4 in CTfiles is aquery feature, if you use it to represent a discrete structure thereis no way to know what the original representation was. If I try toread your structure with BIOVIA I get an error:


    ORA-20100: MDL-1919: Molecule failed registration check:
    Error: (root) No query features allowed for registration
    MDL-0633: Unable to convert molfile string to binary molecule ctab
    ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 359
    ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 352
    ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 335

I've written a wiki section to help explain why the problem exists:https://github.com/cdk/cdk/wiki/CTfile-Reading#aromatic-query-bonds.Rather then reject molfiles with aromatic bonds outright we leave themolecule in an inconsistent state as a user knows their data betterthen us and may be able to correct it. SMILES will automaticallykekulize input because it can safely do so.


Hope that helps,
John

On 26 December 2017 at 14:46, Tim Dudgeon <tdudgeon...@gmail.com<mailto:tdudgeon...@gmail.com>> wrote:


    I've noticed that if you try to depict a structure in molfile
    format that has bonds in rings defined as aromatic type then they
    are depicted as any bonds (dashed), not aromatic (donuts). For
    example take this molfile:


      Mrv17a0 10061711272D

     14 15  0  0  0  0            999 V2000
        0.5420    0.2323    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
        1.2564   -0.1802    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
        1.2564   -1.0052    0.0000 N   0  0  0  0 0  0  0  0  0  0  0  0
        1.9239   -1.4901    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
        2.7085   -1.2352    0.0000 S   0  0  0  0 0  0  0  0  0  0  0  0
        3.3216   -1.7872    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
        1.6689   -2.2748    0.0000 N   0  0  0  0 0  0  0  0  0  0  0  0
        0.8439   -2.2748    0.0000 N   0  0  0  0 0  0  0  0  0  0  0  0
        0.5890   -1.4901    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
       -0.1956   -1.2352    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
       -0.8631   -1.7201    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
       -1.5305   -1.2352    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
       -1.2756   -0.4506    0.0000 C   0  0  0  0 0  0  0  0  0  0  0  0
       -0.4506   -0.4506    0.0000 S   0  0  0  0 0  0  0  0  0  0  0  0
      1  2  1  0  0  0  0
      2  3  1  0  0  0  0
      3  4  4  0  0  0  0
      4  5  1  0  0  0  0
      5  6  1  0  0  0  0
      4  7  4  0  0  0  0
      7  8  4  0  0  0  0
      8  9  4  0  0  0  0
      3  9  4  0  0  0  0
      9 10  1  0  0  0  0
     10 11  4  0  0  0  0
     11 12  4  0  0  0  0
     12 13  4  0  0  0  0
     13 14  4  0  0  0  0
     10 14  4  0  0  0  0
    M  END

    Some of the bonds are clearly aromatic (4 in the 3rd column of the
    bond block). But when rendering with code like this you get those
    bonds depicted as dashed bonds:

    String mol = ...
    DepictionGenerator dg = new DepictionGenerator()
                    .withTerminalCarbons()
                    .withSize(500d, 400d)
                    .withFillToFit()

    MDLV2000Reader v2000Parser = new MDLV2000Reader(new
    ByteArrayInputStream(mol.getBytes()))
    IAtomContainer atomContainer = v2000Parser.read(new AtomContainer())
    Depiction depiction = dg.depict(atomContainer)
            depiction.writeTo("png", "/tmp/mol.png")


    This is using either CDK 2.0 or 2.1.

    If you try a similar thing with the same molecule in smiles format
    the behaviour is a bit different.

    String mol2 = 'CCn1c(SC)nnc1-c1cccs1'
    SmilesParser parser = new
    SmilesParser(SilentChemObjectBuilder.getInstance())
    IAtomContainer atomContainer2 = parser.parseSmiles(mol2)

    In this case the molecule gets depicted in kekule form. This seems
    to be because by default the smiles parser kekulises the molecule
    (unlike the MDLV2000Reader) though you can turn this off:
    parser.kekulise(false)
    in which case you get the molecule depicted with dashed bonds again.

    So the key questions:

    1. Why the inconsistency in how the different parsers/readers
    behave? Is this documented anywhere?

    2. Is it possible to have the aromatic bonds depicted as proper
    aromatic bonds?

    Tim


    
------------------------------------------------------------------------------
    Check out the vibrant tech community on one of the world's most
    engaging tech sites, Slashdot.org! http://sdm.link/slashdot
    _______________________________________________
    Cdk-user mailing list
    Cdk-user@lists.sourceforge.net <mailto:Cdk-user@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/cdk-user
    <https://lists.sourceforge.net/lists/listinfo/cdk-user>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: [Cdk-user] aromatic bonds depicted as any bonds

Reply via email to