Hi Joos, a short, quick reply... I will not have time to look in detail into the issue in the next two weeks...
On Thu, Dec 8, 2011 at 12:47 PM, Joos Kiener <[email protected]> wrote: > The Question is related to the cdk based project I'm working on which I will > "officially release" once I believe it is usable enough. That would be the 1.4 series. > I use UIT for Subgraph matching and the ExtendedFingerprinter. I had the > feeling that the fingerprint wasn't especially great at least for the used > dataset (Part of Subset 13 of ZINC) and hence I wanted to try out the > PubchemFingerprinter which I did put now I was getting different amount of > search hits than before. See below tables. I'm now wondering if it is a bug > on my part or in the Fingerprints and/or UIT. How can I determine the > actually correct result? Especially since the reference also disagrees with > UIT. > > PubchemFingerprinter: > > SMILES Screening Hits Hits > CCC(C)C(C)C(C)C 8599 344 > > ExtendedFingerprinter > > SMILES Screening Hits Hits > CCC(C)C(C)C(C)C 22488 429 > > No Screening, just UIT: > > SMILES Hits > CCC(C)C(C)C(C)C 436 > > As a Reference the same Searches were done in ChemFinder over the same Data > Set > > SMILES Hits Found in ChemFinder > CCC(C)C(C)C(C)C 427 So, one would expect to find 436 with the CDK for each of the three approaches. The difference with 427 in ChemFinder can have many reasons (preprocessing, their substructure matching, ...) and am not eager to hypothesize on why that is different. It is indeed worrying to see that apparently the PubchemFingerprinter and ExtendedFingerprinter miss out on a true positives. Can you identify those structures? Maybe to start with the seven that the ExtendedFingerprinter doesn't find. Then we can start debugging why those are not found... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

