Hi Joos,

a short, quick reply... I will not have time to look in detail into
the issue in the next two weeks...

On Thu, Dec 8, 2011 at 12:47 PM, Joos Kiener <[email protected]> wrote:
> The Question is related to the cdk based project I'm working on which I will
> "officially release" once I believe it is usable enough.

That would be the 1.4 series.

> I use UIT for Subgraph matching and the ExtendedFingerprinter. I had the
> feeling that the fingerprint wasn't especially great at least for the used
> dataset (Part of Subset 13 of ZINC) and hence I wanted to try out the
> PubchemFingerprinter which I did put now I was getting different amount of
> search hits than before. See below tables. I'm now wondering if it is a bug
> on my part or in the Fingerprints and/or UIT. How can I determine the
> actually correct result? Especially since the reference also disagrees with
> UIT.
>
> PubchemFingerprinter:
>
> SMILES                    Screening Hits    Hits
> CCC(C)C(C)C(C)C               8599         344
>
> ExtendedFingerprinter
>
> SMILES                    Screening Hits    Hits
> CCC(C)C(C)C(C)C                22488        429
>
> No Screening, just UIT:
>
> SMILES                                              Hits
> CCC(C)C(C)C(C)C                                436
>
> As a Reference the same Searches were done in ChemFinder over the same Data
> Set
>
> SMILES                        Hits Found in ChemFinder
> CCC(C)C(C)C(C)C                              427

So, one would expect to find 436 with the CDK for each of the three
approaches. The difference with 427 in ChemFinder can have many
reasons (preprocessing, their substructure matching, ...) and am not
eager to hypothesize on why that is different.

It is indeed worrying to see that apparently the PubchemFingerprinter
and ExtendedFingerprinter miss out on a true positives. Can you
identify those structures? Maybe to start with the seven that the
ExtendedFingerprinter doesn't find. Then we can start debugging why
those are not found...

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to