Incidentally, I believe ChemAxon is the only one producing these molfiles
with aromatic bonds. Certainly CDK/RDKit/OpenBabel/OEChem don't, I think
Indigo used to generate them in older versions.
$ obabel -ismi -:'c1ccccc1' -omol
OpenBabel01031816072D
6 6 0 0 0 0 0 0 0 0999 V2000
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
> 1 6 1 0 0 0 0
> 1 2 2 0 0 0 0
> 2 3 1 0 0 0 0
> 3 4 2 0 0 0 0
> 4 5 1 0 0 0 0
> 5 6 2 0 0 0 0
> M END
> 1 molecule converted
On 3 January 2018 at 15:58, Tim Dudgeon <tdudgeon...@gmail.com> wrote:
> John,
>
> Thanks for the details response. I think it will be useful to be able to
> depict aromatic bonds, and, as you mention, the main proper use for this
> will be for query structures and fragments. However, many structures out
> there in the wild do use aromatic bonds, so I think its useful to have it
> for normal structures too.
>
> Tim
>
> p.s. when I referred to the dotted bond as 'ANY' bond, this is the
> notation that ChemAxon uses to depict this type of query bond. But I guess
> that's not an absolute standard.
>
> On 03/01/18 14:03, John Mayfield wrote:
>
> I'll answer these back to front as the second one is much simpler to
> answer:
>
> 1. Why the inconsistency in how the different parsers/readers behave? Is
>> this documented anywhere?
>> 2. Is it possible to have the aromatic bonds depicted as proper aromatic
>> bonds?
>
>
> Answer 2: I don't think the dohnuts are useful for plain old structures,
> only query structures. The circles also do not scale well to all cases
> (porphyrin is a classic). The dashed bond in the depiction is not really
> 'any' bond as you say but rather "you input was junk/had missing
> information" (see the next Answer on why that is). Since as I said for
> query structures you need the 'delocalised' bond depiction i've updated the
> renderer accordingly. For now I've just done an offset dash but will try
> and find time to add in the dohuts: https://github.com/cdk/cdk/pull/403
>
> Answer 1: The short answer is CDK matches behaviour to what Daylight does
> for SMILES and what MDL/Symyx/Accelrys/BIOVIA do for molfile. You can
> safely use aromatic bond types in SMILES and not in CTfiles.
>
> In CDK aromaticity is a bond property and not a type/order, that is to say
> the bond order is independent of the aromatic status of the bond. The
> "normal form" of a molecule in the CDK is to have all the hydrogen counts
> and bond orders set - if this not so you will get warnings/exceptions all
> over the place. A molecule can be in an inconsistent state if an input
> format was invalid or you create it that way manually. As I'm sure you
> know, bond type = 4 in CTfiles is a query feature, if you use it to
> represent a discrete structure there is no way to know what the original
> representation was. If I try to read your structure with BIOVIA I get an
> error:
>
> ORA-20100: MDL-1919: Molecule failed registration check:
>> Error: (root) No query features allowed for registration
>> MDL-0633: Unable to convert molfile string to binary molecule ctab
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 359
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 352
>> ORA-06512: at "C$DIRECT2017.MDLAUXOP", line 335
>
>
> I've written a wiki section to help explain why the problem exists:
> https://github.com/cdk/cdk/wiki/CTfile-Reading#aromatic-query-bonds.
> Rather then reject molfiles with aromatic bonds outright we leave the
> molecule in an inconsistent state as a user knows their data better then us
> and may be able to correct it. SMILES will automatically kekulize input
> because it can safely do so.
>
> Hope that helps,
> John
>
>
> On 26 December 2017 at 14:46, Tim Dudgeon <tdudgeon...@gmail.com> wrote:
>
>> I've noticed that if you try to depict a structure in molfile format that
>> has bonds in rings defined as aromatic type then they are depicted as any
>> bonds (dashed), not aromatic (donuts). For example take this molfile:
>>
>>
>> Mrv17a0 10061711272D
>>
>> 14 15 0 0 0 0 999 V2000
>> 0.5420 0.2323 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> 1.2564 -0.1802 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> 1.2564 -1.0052 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
>> 1.9239 -1.4901 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> 2.7085 -1.2352 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
>> 3.3216 -1.7872 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> 1.6689 -2.2748 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
>> 0.8439 -2.2748 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
>> 0.5890 -1.4901 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> -0.1956 -1.2352 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> -0.8631 -1.7201 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> -1.5305 -1.2352 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> -1.2756 -0.4506 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
>> -0.4506 -0.4506 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
>> 1 2 1 0 0 0 0
>> 2 3 1 0 0 0 0
>> 3 4 4 0 0 0 0
>> 4 5 1 0 0 0 0
>> 5 6 1 0 0 0 0
>> 4 7 4 0 0 0 0
>> 7 8 4 0 0 0 0
>> 8 9 4 0 0 0 0
>> 3 9 4 0 0 0 0
>> 9 10 1 0 0 0 0
>> 10 11 4 0 0 0 0
>> 11 12 4 0 0 0 0
>> 12 13 4 0 0 0 0
>> 13 14 4 0 0 0 0
>> 10 14 4 0 0 0 0
>> M END
>>
>> Some of the bonds are clearly aromatic (4 in the 3rd column of the bond
>> block). But when rendering with code like this you get those bonds depicted
>> as dashed bonds:
>>
>> String mol = ...
>> DepictionGenerator dg = new DepictionGenerator()
>> .withTerminalCarbons()
>> .withSize(500d, 400d)
>> .withFillToFit()
>>
>> MDLV2000Reader v2000Parser = new MDLV2000Reader(new
>> ByteArrayInputStream(mol.getBytes()))
>> IAtomContainer atomContainer = v2000Parser.read(new AtomContainer())
>> Depiction depiction = dg.depict(atomContainer)
>> depiction.writeTo("png", "/tmp/mol.png")
>>
>> This is using either CDK 2.0 or 2.1.
>>
>> If you try a similar thing with the same molecule in smiles format the
>> behaviour is a bit different.
>>
>> String mol2 = 'CCn1c(SC)nnc1-c1cccs1'
>> SmilesParser parser = new SmilesParser(SilentChemObjectB
>> uilder.getInstance())
>> IAtomContainer atomContainer2 = parser.parseSmiles(mol2)
>>
>> In this case the molecule gets depicted in kekule form. This seems to be
>> because by default the smiles parser kekulises the molecule (unlike the
>> MDLV2000Reader) though you can turn this off:
>> parser.kekulise(false)
>> in which case you get the molecule depicted with dashed bonds again.
>>
>> So the key questions:
>>
>> 1. Why the inconsistency in how the different parsers/readers behave? Is
>> this documented anywhere?
>>
>> 2. Is it possible to have the aromatic bonds depicted as proper aromatic
>> bonds?
>>
>> Tim
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
>>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user