Incidentally, I believe ChemAxon is the only one producing these molfiles
with aromatic bonds. Certainly CDK/RDKit/OpenBabel/OEChem don't, I think
Indigo used to generate them in older versions.

$ obabel -ismi -:'c1ccccc1' -omol



OpenBabel01031816072D



  6  6  0  0  0  0  0  0  0  0999 V2000
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  6  1  0  0  0  0
>   1  2  2  0  0  0  0
>   2  3  1  0  0  0  0
>   3  4  2  0  0  0  0
>   4  5  1  0  0  0  0
>   5  6  2  0  0  0  0
> M  END
> 1 molecule converted


On 3 January 2018 at 15:58, Tim Dudgeon <tdudgeon...@gmail.com> wrote:

> John,
>
> Thanks for the details response. I think it will be useful to be able to
> depict aromatic bonds, and, as you mention, the main proper use for this
> will be for query structures and fragments. However, many structures out
> there in the wild do use aromatic bonds, so I think its useful to have it
> for normal structures too.
>
> Tim
>
> p.s. when I referred to the dotted bond as 'ANY' bond, this is the
> notation that ChemAxon uses to depict this type of query bond. But I guess
> that's not an absolute standard.
>
> On 03/01/18 14:03, John Mayfield wrote:
>
> I'll answer these back to front as the second one is much simpler to
> answer:
>
> 1. Why the inconsistency in how the different parsers/readers behave? Is
>> this documented anywhere?
>> 2. Is it possible to have the aromatic bonds depicted as proper aromatic
>> bonds?
>
>
> Answer 2: I don't think the dohnuts are useful for plain old structures,
> only query structures. The circles also do not scale well to all cases
> (porphyrin is a classic). The dashed bond in the depiction is not really
> 'any' bond as you say but rather "you input was junk/had missing
> information" (see the next Answer on why that is). Since as I said for
> query structures you need the 'delocalised' bond depiction i've updated the
> renderer accordingly. For now I've just done an offset dash but will try
> and find time to add in the dohuts: https://github.com/cdk/cdk/pull/403
>
> Answer 1: The short answer is CDK matches behaviour to what Daylight does
> for SMILES and what MDL/Symyx/Accelrys/BIOVIA do for molfile. You can
> safely use aromatic bond types in SMILES and not in CTfiles.
>
> In CDK aromaticity is a bond property and not a type/order, that is to say
> the bond order is independent of the aromatic status of the bond. The
> "normal form" of a molecule in the CDK is to have all the hydrogen counts
> and bond orders set - if this not so you will get warnings/exceptions all
> over the place. A molecule can be in an inconsistent state if an input
> format was invalid or you create it that way manually. As I'm sure you
> know, bond type = 4 in CTfiles is a query feature, if you use it to
> represent a discrete structure there is no way to know what the original
> representation was. If I try to read your structure with BIOVIA I get an
> error:
>
> ORA-20100: MDL-1919: Molecule failed registration check:
>> Error: (root) No query features allowed for registration
>> MDL-0633: Unable to convert molfile string to binary molecule ctab
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 359
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 352
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 335
>
>
> I've written a wiki section to help explain why the problem exists:
> https://github.com/cdk/cdk/wiki/CTfile-Reading#aromatic-query-bonds.
> Rather then reject molfiles with aromatic bonds outright we leave the
> molecule in an inconsistent state as a user knows their data better then us
> and may be able to correct it. SMILES will automatically kekulize input
> because it can safely do so.
>
> Hope that helps,
> John
>
>
> On 26 December 2017 at 14:46, Tim Dudgeon <tdudgeon...@gmail.com> wrote:
>
>> I've noticed that if you try to depict a structure in molfile format that
>> has bonds in rings defined as aromatic type then they are depicted as any
>> bonds (dashed), not aromatic (donuts). For example take this molfile:
>>
>>
>>   Mrv17a0 10061711272D
>>
>>  14 15  0  0  0  0            999 V2000
>>     0.5420    0.2323    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>     1.2564   -0.1802    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>     1.2564   -1.0052    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>     1.9239   -1.4901    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>     2.7085   -1.2352    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
>>     3.3216   -1.7872    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>     1.6689   -2.2748    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>     0.8439   -2.2748    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>>     0.5890   -1.4901    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>    -0.1956   -1.2352    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>    -0.8631   -1.7201    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>    -1.5305   -1.2352    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>    -1.2756   -0.4506    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>    -0.4506   -0.4506    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
>>   1  2  1  0  0  0  0
>>   2  3  1  0  0  0  0
>>   3  4  4  0  0  0  0
>>   4  5  1  0  0  0  0
>>   5  6  1  0  0  0  0
>>   4  7  4  0  0  0  0
>>   7  8  4  0  0  0  0
>>   8  9  4  0  0  0  0
>>   3  9  4  0  0  0  0
>>   9 10  1  0  0  0  0
>>  10 11  4  0  0  0  0
>>  11 12  4  0  0  0  0
>>  12 13  4  0  0  0  0
>>  13 14  4  0  0  0  0
>>  10 14  4  0  0  0  0
>> M  END
>>
>> Some of the bonds are clearly aromatic (4 in the 3rd column of the bond
>> block). But when rendering with code like this you get those bonds depicted
>> as dashed bonds:
>>
>> String mol = ...
>> DepictionGenerator dg = new DepictionGenerator()
>>                 .withTerminalCarbons()
>>                 .withSize(500d, 400d)
>>                 .withFillToFit()
>>
>> MDLV2000Reader v2000Parser = new MDLV2000Reader(new
>> ByteArrayInputStream(mol.getBytes()))
>> IAtomContainer atomContainer = v2000Parser.read(new AtomContainer())
>> Depiction depiction = dg.depict(atomContainer)
>>         depiction.writeTo("png", "/tmp/mol.png")
>>
>> This is using either CDK 2.0 or 2.1.
>>
>> If you try a similar thing with the same molecule in smiles format the
>> behaviour is a bit different.
>>
>> String mol2 = 'CCn1c(SC)nnc1-c1cccs1'
>> SmilesParser parser = new SmilesParser(SilentChemObjectB
>> uilder.getInstance())
>> IAtomContainer atomContainer2 = parser.parseSmiles(mol2)
>>
>> In this case the molecule gets depicted in kekule form. This seems to be
>> because by default the smiles parser kekulises the molecule (unlike the
>> MDLV2000Reader) though you can turn this off:
>> parser.kekulise(false)
>> in which case you get the molecule depicted with dashed bonds again.
>>
>> So the key questions:
>>
>> 1. Why the inconsistency in how the different parsers/readers behave? Is
>> this documented anywhere?
>>
>> 2. Is it possible to have the aromatic bonds depicted as proper aromatic
>> bonds?
>>
>> Tim
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to