Hi there,

 

To follow up on this issue, I applied Tim's patch and so far I have about
200 molecules and complexes.  I'm unfamiliar with SMILES so I can't quickly
check whether they are correct, but please feel free to look at them.

 

https://dl.dropboxusercontent.com/u/5381783/newsmiles.txt

 

If they are correct then I would support adding this patch to the next
OpenBabel release.

 

Thanks,

 

-          Lee-Ping

 

From: Lee-Ping Wang [mailto:leep...@stanford.edu] 
Sent: Tuesday, February 18, 2014 7:25 PM
To: Tim Vandermeersch
Cc: Geoffrey Hutchison; Craig James; OpenBabel list; Openbabel-DEV
Subject: Re: [Open Babel] [OpenBabel-Devel] Multimolecule canonical SMILES

 

Hi there,

 

Thanks for including me in the developer discussion, the bug report must
have come from somebody else because I didn't submit one.  

 

I'm actually looking at a bunch of chemical reactions, but I'm not using the
OpenBabel functionality for reactions because I had issues with it - just
the individual reactants and products.

 

Here is a bunch of complexes generated by OpenBabel canonical SMILES that
have formaldehyde occurring in different orderings.  My temporary workaround
is to split the complex by ".", canonicalize each one, and sort
alphabetically.  It would be great to have the fix implemented though.

 

Thanks,

 

- Lee-Ping

 

O=[CH2].[NH3].[NH4]
[CH2]=O
[NH2][C@@H]1[NH]O1.[CH2]=O
O=[CH2].[NH3].[NH3]
[CH2]=O.[NH3]
[CH2]=O.[OH2].[NH3]
[CH2]=O.[OH2]
O=[CH2].[OH2].[OH2]
[CH2]=O.[OH2].[NH3]
[CH2]=O.[NH3]
[CH2]=O.[OH2].[NH3]
[CH3][CH2][NH2].[CH2]=O
O=[CH2].[NH3].[H][H]
O=[CH2].[NH3].[NH4]
[CH2]=O.[OH2]
[CH2]=O.[NH3]
[CH2]=O.[OH2].[NH3]
[CH2]=O.[NH2]
[CH2]=O
O=[CH2].O=[C]
O=[CH2].O=[C]
[CH2]=O.[CH2]
[CH2]=O.[OH2]
[CH2]=O.[NH]
[CH2][C][O].[CH2]=O
[OH]N=[CH2].[CH2]=O.[OH2]
[NH][CH][OH].[CH2]=O.[OH2]
[NH][CH]O[CH2]OC(=O)[OH].[CH2]=O
[NH][CH]O[CH2]OC(=O)[OH].[CH2]=O
[OH][CH2]OC(=O)[OH].[CH2]=O.[C][NH]
[CH2]=O.[OH2]
O=[CH2].[H][H]
[CH2]=O.[OH2]
[CH2]=O.[OH2].[NH3]
[CH2]=O.[OH2]
[CH2]=O.[CH3][NH2].[NH3]
[CH2]=O.[NH3]
[CH2]=O
O1O[CH]1.O=[CH2]
O=[CH2].[CH3][CH]
[CH2]=O.[O]

On Feb 17, 2014, at 11:10 AM, Tim Vandermeersch
<tim.vandermeer...@gmail.com> wrote:





Hi,

 

I found the examples in the bug report. The code in canon.h handles
disconnected fragments correctly. However, in smilesformat.cpp the symmetry
classes are computed for the molecule as a whole. This is not what the
canonical coding algorithm expects and the resulting smiles may be different
when canonicalized separately or as part of a complex.

 

I have a simple patch to fix this here:
http://moldb.net/timvdm/multi_mol_smiles.patch

 

With this patch I get the correct results:

 

$ echo 'OC(=O)[C@@H]([C@H](C(=O)O)O)O.CNC[C@@H](c1ccc(c(c1)O)O)O' | obabel
-ismi -ocan

O[C@H]([C@H](C(=O)O)O)C(=O)O.CNC[C@@H](c1ccc(c(c1)O)O)O

1 molecule converted

$ echo 'OC(=O)[C@@H]([C@H](C(=O)O)O)O' | obabel -ismi -ocan

O[C@H]([C@H](C(=O)O)O)C(=O)O

1 molecule converted

 

This patch should only affect multi-molecule smiles. Single fragment smiles
are not affected. I can run this patch on a few million molecules and
compare the results to get a better view (with more confidence) of what
would actually change.

 

Tim 

 

On Mon, Feb 17, 2014 at 6:41 PM, Tim Vandermeersch
<tim.vandermeer...@gmail.com> wrote:

Hi,

 

Are there any examples of disconnected SMILES that have this problem? IIRC,
a canonical code is created for each fragment individually and these are
later sorted to create the entire canonical order. A quick look at the code
confirms this but I'll try to test some cases tonight to see if this is
still the case.

 

Tim

 

On Mon, Feb 17, 2014 at 4:50 PM, Geoffrey Hutchison
<geoff.hutchi...@gmail.com> wrote:

> This is actually a pretty bad thing. But it may not be that easy to fix,
and would result in a major change to the SMILES that OpenBabel produces
(very unfortunate, as it requires large databases to be completely
re-canonicalized).

As far as I can tell, the code is still there.. As far as the different
canonicalization, we had a variety of bugs in the canonicalization going
from 2.2 -> 2.3. I agree, it's not present, but I think it's an important
bug to fix soon and will be part of the 2.4.0 release.


-Geoff


----------------------------------------------------------------------------
--
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151
<http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktr
k> &iu=/4140/ostg.clktrk
_______________________________________________
OpenBabel-Devel mailing list
openbabel-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

 

 

 

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to