Hi Manual,

Chris is right, unfortunately the ChemDraw export isn't quite correct. It
is actually possible to represent multi-attach in V3000 but it's not used
here. The more common problem is that there are simply a random bond into
the middle of a ring. I've done a fair bit of work on ChemDraw processing (
https://nextmovesoftware.com/blog/2016/07/28/sketchy-sketches/), the
biggest issue is the ChemDraw chemical formula/abbreviation parsing, for
example K2CO3 has a peroxide, HATU is a "[H]*[3H][U]", etc (I show more
examples in the poster).

NextMove has a commercial tool to generate CXSMILES, for you example note
the *m:* part on the end that captures the positional variation.

[john@harbinger:Praline]% java -jar exec/target/praline.jar convert
> ~/Downloads/structure.cdx --cxsmi
> [Ru]([P](CCC1=CC=CC=C1)(C2CCCCC2)C3CCCCC3)(Cl)(Cl)*.C1(=CC=C(C=C1)C(C)C)C
> |m:24:25.26.27.28.29.30| structure Molecule/Specific/High/+PVar


CDK can read and handle this, we actually do get the formula wrong still
though (will fix that).

OpenBabel has a FOSS ChemDraw parser, one option could be to modify that
and parse your examples to get the info and then generate the
MOLfile/CXSMILES. The parsing is easy *NodeType="MultipleAttach"
Attachments="{id1} {id2} .."* where the id's are node ids. Unfortunately I
don't think they have the data structures to represent it so it would be a
fair bit of work other than handling these fields.

All the best,
John

On Wed, 2 Dec 2020 at 15:05, Christoph Steinbeck <
christoph.steinb...@uni-jena.de> wrote:

> Dear Manuel,
>
> if you open the mol file in a text editor, there are clearly 31 C atoms in
> the file.
> So the CDK is “right”. I also opened the file in Marvin Sketch and it
> output the analysis below.
>
> ChemDraw uses a fishy trick, as it seems, to create the illusion of a
> multi-center attachment. Clearly, they focus on publication-ready drawing
> of chemical structures and not one creating correct file representations of
> the chemistry. Fact is that the end of the line to the center of the
> benzene ring is a carbon atom and nothing else.
>
> Kind regards,
>
> Chris
>
> —
> Prof. Dr. Christoph Steinbeck
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> Phone Secretariat: +49-3641-948171
> http://cheminf.uni-jena.de
> http://orcid.org/0000-0001-6966-0814
>
> What is man but that lofty spirit - that sense of enterprise.
> ... Kirk, "I, Mudd," stardate 4513.3..
>
>
>
>
>
> > On 2. Dec 2020, at 14:38, Stesycki, Manuel <stesy...@mpi-muelheim.mpg.de>
> wrote:
> >
> > Dear CDK users,
> >
> > we are using CDK version 2.3 in our application.
> > As a user tried to add a structure (see attachment) we found a
> difference in the molecular formula of the structure.
> >
> > The original structure was draw with ChemDraw 18.
> > A multi-center attachment was added to the structure and ChemDraw shows
> this molecular formula: C30H46Cl2PRu
> >
> > Whereas our application takes the mol-version of the cdx-file and
> computes this formula: C31H49Cl2PRu
> > To get the formula we use this piece of code:
> >
> > IMolecularFormula form =
> MolecularFormulaManipulator.getMolecularFormula(mol);
> > sumFormula = MolecularFormulaManipulator.getString(form);
> >
> > Did we missed something by creating the AtomContainer?
> > We create the atomcontainer directly by parsing the mol-file:
> > try (StringReader sr = new StringReader(molFile); MDLV2000Reader mr =
> new MDLV2000Reader(sr, mode)) {
> >
> >             AtomContainer mol = new AtomContainer();
> >             AtomContainer ac = mr.read(mol);
> > }
> >
> > Maybe someone can give us a hint, what we are doing wrong.
> >
> > Best regards,
> >    Manuel Stesycki
> >
> > IT
> >    0208 / 306-2146
> >    Physikbau, Büro 117
> >    stesy...@mpi-muelheim.mpg.de
> >
> > Max-Planck-Institut für Kohlenforschung
> >    Kaiser-Wilhelm-Platz 1
> >    D-45470 Mülheim an der Ruhr
> >    http://www.kofo.mpg.de/de
> >
> > _______________________________________________
> > Cdk-user mailing list
> > Cdk-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
>
>
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to