Thanks for the explaination - much clearer now!

On Thu, Nov 7, 2013 at 1:05 PM, John May <[email protected]> wrote:

> Hi all,
>
> Yep the SMILES parser changed on master and won’t accept invalid SMILES by
> default. Notice how daylight rejects it also. It should now be the case now
> that if CDK rejects it - daylight also rejects it (If not then it’s a bug).
> The new parser automatically kekulises on load, verifying the bond orders
> can be assigned to aromatic systems. This is much friendly for the CDK as
> you don’t have molecules with all single aromatic bonds floating about.
> When we added this it fixed 2 failing unit tests.
>
> In the molecule you're missing a hydrogen of one or more nitrogens, to
> know which ones is the problem.
>
> The SMILES should be:
>
> c4ccc2c(cc1=Nc3*[nH]*cccc3(Cn12))c4
>>>
>>
> Some toolkits will *fix* this by default but that’s making several
> assumptions and it’s nothing more than an hack for broken SMILES input. To
> fix this you need to change the formula of the molecule which is never a
> good start.  You can still parse it with the CDK by turning on ‘preserve
> aromaticity’ (need to rename) this disables electron checking but I
> strongly discourage that. The actual fix involves checking every possible
> combination of hydrogens on aromatic nitrogens and phosphates, checkout the
> fixarom core from http://www.daylight.com/download/contrib/.
>
> Now where this molecules come from is probably more interesting. Most
> likely it’s people using the aromaticity models on formats which don’t
> support it. The MDL model for example doesn’t allow lone pair
> contributions. If you have marvin sketch, try loading ‘[nH]1cccc1’ and then
> generating an MDL mol file. You’ll notice they have there own non-portable
> work around to ensure the hydrogen is kept. Of course everyone knows you
> should never store aromaticity in the mol file :-).
>
>   Mrv0541 11071317592D
>
>   5  5  0  0  0  0            999 V2000
>     1.2964    0.6723    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>     1.9639    0.1874    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     1.7089   -0.5972    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.8839   -0.5972    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     0.6290    0.1874    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  4  0  0  0  0
>   2  3  4  0  0  0  0
>   4  5  4  0  0  0  0
>   1  5  4  0  0  0  0
>   3  4  4  0  0  0  0
> M  STY  1   1 DAT
> M  SAL   1  1   1
> M  SDT   1 MRV_IMPLICIT_H
> M  SDD   1     0.0000    0.0000    DR    ALL  0       0
> M  SED   1 IMPL_H1
> M  END
>
>
> Oh some more examples which are now correctly rejected.
>
> C/1.C/C=C/1
>
> C-1.C/C=C=1
>
> ccc
>
> ccccc
>
> p1cccc1 <- generated by older CDK versions!
>
>
> Cheers,
> J
>
> On 7 Nov 2013, at 16:33, Nina Jeliazkova <[email protected]>
> wrote:
>
>
>
>
> On 7 November 2013 18:26, Nina Jeliazkova <[email protected]>wrote:
>
>>
>>
>>
>> On 7 November 2013 18:18, Rajarshi Guha <[email protected]> wrote:
>>
>>> It seems
>>>
>>> c4ccc2c(cc1=Nc3ncccc3(Cn12))c4
>>>
>>> does not parse using the latest CDK master, but does parse fine using
>>> http://apps.ideaconsult.net:8080/ambit2/depict?search=c4ccc2c%28cc1%3DNc3ncccc3%28Cn12%29%29c4&smarts=
>>>
>>> I'm not sure what version ambit is using
>>>
>>
>> cdk 1.4.11
>>
>
> There is also a test version using cdk 1.5.3 (Sep 2013) and seems to parse
> fine
>
> http://apps.ideaconsult.net:8080/bioclipse/depict?search=c4ccc2c%28cc1%3DNc3ncccc3%28Cn12%29%29c4&smarts=<http://apps.ideaconsult.net:8080/bioclipse/depict?search=c4ccc2c(cc1=Nc3ncccc3(Cn12))c4&smarts=>
>
>
> Nina
>
>>
>> Regards,
>> Nina
>>
>>>  but could somebody confirm this issue with the latest master?
>>>
>>> --
>>> Rajarshi Guha | http://blog.rguha.net
>>> NIH Center for Advancing Translational Science
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models.
>>> Explore
>>> techniques for threading, error checking, porting, and tuning. Get the
>>> most
>>> from the latest Intel processors and coprocessors. See abstracts and
>>> register
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Cdk-user mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>>
>>
>
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the
> most
> from the latest Intel processors and coprocessors. See abstracts and
> register
>
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________
> Cdk-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
>


-- 
Rajarshi Guha | http://blog.rguha.net
NIH Center for Advancing Translational Science
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to