Hi Joos, I concur with Nina's view on library vs database search.

Just wondering if you had used the right code...

https://github.com/asad/SMSD/tree/master/src/org/openscience/smsd/algorithm/vflib/substructure

Kind test it with the above code.

or use 
https://github.com/asad/SMSD/blob/master/src/org/openscience/smsd/Substructure.java

Asad
 
On 8 Sep 2011, at 10:48, [email protected] wrote:

> Send Cdk-user mailing list submissions to
>       [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://lists.sourceforge.net/lists/listinfo/cdk-user
> or, via email, send a message with subject or body 'help' to
>       [email protected]
> 
> You can reach the person managing the list at
>       [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Cdk-user digest..."
> 
> 
> Today's Topics:
> 
>   1. comment about VF2 implementation from chemkit - possible bug
>      (Joos Kiener)
>   2. Re: comment about VF2 implementation from chemkit - possible
>      bug (Nina Jeliazkova)
>   3. Re: inchi generator and valency errors (Sam Adams)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 8 Sep 2011 11:24:41 +0200
> From: Joos Kiener <[email protected]>
> Subject: [Cdk-user] comment about VF2 implementation from chemkit -
>       possible        bug
> To: [email protected]
> Message-ID:
>       <cahjbz72xswb6yosv0t9fcto1pzh0w9ozo1mfke5mikjgfbf...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi all,
> 
> first off i guess you will be hearing more from me rather sooner than later
> but now to the actually subject. Please see:
> 
> http://chembioinfo.wordpress.com/2011/03/15/benchmarking-substructure-search/
> 
> for the context of this message.
> 
> Currently I'm playing around with substructure search ( I have a certain
> goal in mind, more on that in later messages). Anyway UIT isn't exactly fast
> especially compared to commercial products like ChemFinder or InstantJChem
> were searches seem almost instantaneous.
> 
> I was comparing UIT and the above referenced code ported from chemkit. First
> the difference in real world usage seems much less extreme than in that
> benchmark (for small molecules) or I'm misinterpreting the chart. Anyway in
> my case it takes about 60% of the time compared to UIT.
> 
> Now to the subject of the message. I think there is an issue in the ported
> version. Following query returns 44 hits with chemkit and 106 with UIT.
> ChemFinder also gives 106 hits so I'm inclined to believe 106 is correct.
> 
> Here the Query Molecule:
> 
> CCC(C(CC(C(C)C)C)C)C
> 
> Did not find or check for any other inconsistencies.
> 
> Best Regards,
> 
> Joos Kiener
> -------------- next part --------------
> An HTML attachment was scrubbed...
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 8 Sep 2011 12:34:06 +0300
> From: Nina Jeliazkova <[email protected]>
> Subject: Re: [Cdk-user] comment about VF2 implementation from chemkit
>       - possible bug
> To: Joos Kiener <[email protected]>
> Cc: [email protected]
> Message-ID:
>       <CAE5qDd112RU46mk-goCdVVR1-Ni=3koflsdzayqztsso8z3...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> On 8 September 2011 12:24, Joos Kiener <[email protected]> wrote:
> 
>> Hi all,
>> 
>> first off i guess you will be hearing more from me rather sooner than later
>> but now to the actually subject. Please see:
>> 
>> 
>> http://chembioinfo.wordpress.com/2011/03/15/benchmarking-substructure-search/
>> 
>> for the context of this message.
>> 
>> Currently I'm playing around with substructure search ( I have a certain
>> goal in mind, more on that in later messages). Anyway UIT isn't exactly fast
>> especially compared to commercial products like ChemFinder or InstantJChem
>> were searches seem almost instantaneous.
>> 
>> 
> Just to note a comparison between a library method (as UIT) to a database
> search is not quite fair, as database search systems usually employ lot of
> pre-screening and other optimization techniques.
> 
> e.g. this online search does use CDK ( but not UIT )
> 
> http://apps.ideaconsult.net:8080/ambit2/query/smarts?type=smiles&search=CCC%28C%28CC%28C%28C%29C%29C%29C%29C&text=&page=0&pagesize=100
> 
> 
> Best regards,
> Nina
> 
> 
>> I was comparing UIT and the above referenced code ported from chemkit.
>> First the difference in real world usage seems much less extreme than in
>> that benchmark (for small molecules) or I'm misinterpreting the chart.
>> Anyway in my case it takes about 60% of the time compared to UIT.
>> 
>> Now to the subject of the message. I think there is an issue in the ported
>> version. Following query returns 44 hits with chemkit and 106 with UIT.
>> ChemFinder also gives 106 hits so I'm inclined to believe 106 is correct.
>> 
>> Here the Query Molecule:
>> 
>> CCC(C(CC(C(C)C)C)C)C
>> 
>> Did not find or check for any other inconsistencies.
>> 
>> Best Regards,
>> 
>> Joos Kiener
>> 
>> 
>> ------------------------------------------------------------------------------
>> Doing More with Less: The Next Generation Virtual Desktop
>> What are the key obstacles that have prevented many mid-market businesses
>> from deploying virtual desktops?   How do next-generation virtual desktops
>> provide companies an easier-to-deploy, easier-to-manage and more affordable
>> virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
>> _______________________________________________
>> Cdk-user mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>> 
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 8 Sep 2011 10:48:40 +0100
> From: Sam Adams <[email protected]>
> Subject: Re: [Cdk-user] inchi generator and valency errors
> To: Nina Jeliazkova <[email protected]>
> Cc: Sam Adams <[email protected]>, [email protected]
> Message-ID:
>       <CALiCMJ5ZrZ=aw3mn682n+l+6w8m5muodqhb+cwx8a_r1ta0...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi,
> 
> InChI generation should work with either implicit or explicit hydrogens.
> 
> It looks like there's a bug in the passing of aromatic bonds from CDK to
> InChI (do I remember correctly that CDK's aromaticity handling get adjusted
> a couple of years ago?).  Anyway, delete lines 294-295 from InChIGenerator
> should fix the things.
> 
> https://github.com/egonw/cdk/blob/master/src/main/org/openscience/cdk/inchi/InChIGenerator.java#L294:
> if (bond.getFlag(CDKConstants.ISAROMATIC)) {
> * *order = INCHI_BOND_TYPE.ALTERN;
> 
> I haven't got CDK on my machine at the moment, so it would be quicker for
> someone else to makes the changes.
> 
> Cheers,
> 
> Sam
> 
> 
> On 4 September 2011 07:39, Nina Jeliazkova <[email protected]>wrote:
> 
>> 
>> 
>> On 4 September 2011 09:25, Egon Willighagen 
>> <[email protected]>wrote:
>> 
>>> cc:Sam (author of the CDK-InChI bridge)
>>> 
>>> On Sun, Sep 4, 2011 at 7:52 AM, Nina Jeliazkova
>>> <[email protected]> wrote:
>>>> This usually happens, when the molecule does not contain explicit
>>>> hydrogens.
>>> 
>>> I was not aware of that. Doesn't sound very useful. We don't have a
>>> unit test for this yet, right? That tests for InChI generation for a
>>> compound with and without explicit hydrogens, do we?
>>> 
>>> Does this happen for any compound?
>>> 
>> 
>> Aromatics only.
>> 
>> I came into this issue only recently, when working on metabolite generation
>> in Toxtree, haven't tested on a large scale.
>> 
>> 
>>> 
>>>> e.g. the test below fails with exactly the same message : "Accepted
>>> unusual
>>>> valence(s): C(3); Cannot process aromatic bonds"
>>>> SmilesParser p = new
>>>> SmilesParser(NoNotificationChemObjectBuilder.getInstance());
>>>> IMolecule mol = p.parseSmiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C");
>>>> /*
>>>> CDKHydrogenAdder ha =
>>>> 
>>> CDKHydrogenAdder.getInstance(NoNotificationChemObjectBuilder.getInstance());
>>>> ha.addImplicitHydrogens(mol);
>>>> AtomContainerManipulator.convertImplicitToExplicitHydrogens(mol);
>>>> */
>>>> InChIGeneratorFactory factory = InChIGeneratorFactory.getInstance();
>>>> InChIGenerator gen = factory.getInChIGenerator(mol);
>>>> INCHI_RET ret = gen.getReturnStatus();
>>>> if (ret != INCHI_RET.OKAY) {
>>>>      throw new Exception(String.format("InChI failed: %s [%s]",
>>>> ret.toString(),gen.getMessage()));
>>>> }
>>>> String inchi = gen.getInchi();
>>>> 
>>> Assert.assertEquals("InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3",
>>>> inchi);
>>> 
>>> I'll try to use this to create a unit test. I'll also make one for
>>> methane... wondering how widespread this issue is...
>>> 
>> 
>> Alkanes work fine.
>> 
>> Nina
>> 
>> 
>>> 
>>>> Uncomment the hydrogen adder code and the test will succeed.
>>>> I haven't investigated whether the explicit H requirement is the normal
>>>> InChI behaviour or something in the cdk-inchi interaction, perhaps
>>> others
>>>> could help.  Otherwise, I agree the current behaviour is not quite
>>>> convenient.
>>> 
>>> Sam, do you know what is going on?
>>> 
>>> Egon
>>> 
>>> 
>>> --
>>> Dr E.L. Willighagen
>>> Postdoctoral Researcher
>>> Institutet f?r milj?medicin
>>> Karolinska Institutet (http://ki.se/imm)
>>> Homepage: http://egonw.github.com/
>>> LinkedIn: http://se.linkedin.com/in/egonw
>>> Blog: http://chem-bla-ics.blogspot.com/
>>> PubList: http://www.citeulike.org/user/egonw/tag/papers
>>> 
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> Special Offer -- Download ArcSight Logger for FREE!
>> Finally, a world-class log management solution at an even better
>> price-free! And you'll get a free "Love Thy Logs" t-shirt when you
>> download Logger. Secure your free ArcSight Logger TODAY!
>> http://p.sf.net/sfu/arcsisghtdev2dev
>> _______________________________________________
>> Cdk-user mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>> 
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> 
> ------------------------------
> 
> ------------------------------------------------------------------------------
> Doing More with Less: The Next Generation Virtual Desktop 
> What are the key obstacles that have prevented many mid-market businesses
> from deploying virtual desktops?   How do next-generation virtual desktops
> provide companies an easier-to-deploy, easier-to-manage and more affordable
> virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
> 
> ------------------------------
> 
> _______________________________________________
> Cdk-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cdk-user
> 
> 
> End of Cdk-user Digest, Vol 64, Issue 5
> ***************************************

------------------------------------------------------------------------------
Why Cloud-Based Security and Archiving Make Sense
Osterman Research conducted this study that outlines how and why cloud
computing security and archiving is rapidly being adopted across the IT 
space for its ease of implementation, lower cost, and increased 
reliability. Learn more. http://www.accelacomm.com/jaw/sfnl/114/51425301/
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to