On Apr 22, 2018, at 08:42, Nils Weskamp <nils.wesk...@gmail.com> wrote:
> Nice work. If brute-force approaches like this (or methods based on
> genetic algorithms etc.) are the only way to reverse a fingerprint, one
> could probably come up with a fingerprint that allows for pretty secure
> structure sharing by using many iterations of a strong cryptographic
> hash function that is really slow to calculate.

I usually think of "brute force" as something more like "enumerate all possible 
structures, generate the fingerprint for each one, and compare." I think of 
what I did here as a bit more elegant than that. ;)

I think of fingerprints as being applied to three uses:

1) identity test
    if mol1 == mol2 then fp(mol1) == fp(mol2)
    => if fp(mol1) != fp(mol2) then mol1 != mol2
    
A good identity fingerprint has the properties that the false positive rate of
     fp(mol1) == fp(mol2) but mol1 != mol2
is low, and that working with fingerprints gives advantages over a graph 
isomorphism check.

2) Substructure screening
  if mol1 is a subgraph of mol2 then fp(mol1) is a subset of fp(mol2).
    (There are different definitions of 'subset' for different fingerprint 
types.)

  An effective substructure screen fingerprint has the properties that:
      fp(mol1) is a subset of fp(mol2) => a higher likelihood that mol1 is a 
subgraph of mol2
       -and-
      fingerprint subset test is faster than subgraph isomorphism testing

3) Similarity comparison
  Use similarity of fp(mol1) and fp(mol2) as a proxy/estimate of the similarity 
of mol1 and mol2.

Usually we also assume that computing the similarity of two fingerprints is 
fast.


A cheminformatics fingerprint usually supports #1 and one or both of #2 or #3. 

If we only need #1, which is the use case Nils brought up, then we could use a 
SHA256 of the canonical SMILES string, or use the InChI Key. These are 
fixed-length binary fingerprints which can only be used for identity testing, 
and would give a low false positive rate.

The structure leakage comes from needing support for #2 and/or #3.

I don't see any reasonable way to make a fingerprint that can do #2 and/or #3 
without being open to some sort of enumeration scheme that is more clever than 
brute force.

Possibly some scheme related to homomorphic encryption might work? As I 
understand it, this would be unreasonable slow for what most people expect from 
fingerprints.

Cheers,

                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to