On Apr 22, 2018, at 08:42, Nils Weskamp <nils.wesk...@gmail.com> wrote: > Nice work. If brute-force approaches like this (or methods based on > genetic algorithms etc.) are the only way to reverse a fingerprint, one > could probably come up with a fingerprint that allows for pretty secure > structure sharing by using many iterations of a strong cryptographic > hash function that is really slow to calculate.
I usually think of "brute force" as something more like "enumerate all possible structures, generate the fingerprint for each one, and compare." I think of what I did here as a bit more elegant than that. ;) I think of fingerprints as being applied to three uses: 1) identity test if mol1 == mol2 then fp(mol1) == fp(mol2) => if fp(mol1) != fp(mol2) then mol1 != mol2 A good identity fingerprint has the properties that the false positive rate of fp(mol1) == fp(mol2) but mol1 != mol2 is low, and that working with fingerprints gives advantages over a graph isomorphism check. 2) Substructure screening if mol1 is a subgraph of mol2 then fp(mol1) is a subset of fp(mol2). (There are different definitions of 'subset' for different fingerprint types.) An effective substructure screen fingerprint has the properties that: fp(mol1) is a subset of fp(mol2) => a higher likelihood that mol1 is a subgraph of mol2 -and- fingerprint subset test is faster than subgraph isomorphism testing 3) Similarity comparison Use similarity of fp(mol1) and fp(mol2) as a proxy/estimate of the similarity of mol1 and mol2. Usually we also assume that computing the similarity of two fingerprints is fast. A cheminformatics fingerprint usually supports #1 and one or both of #2 or #3. If we only need #1, which is the use case Nils brought up, then we could use a SHA256 of the canonical SMILES string, or use the InChI Key. These are fixed-length binary fingerprints which can only be used for identity testing, and would give a low false positive rate. The structure leakage comes from needing support for #2 and/or #3. I don't see any reasonable way to make a fingerprint that can do #2 and/or #3 without being open to some sort of enumeration scheme that is more clever than brute force. Possibly some scheme related to homomorphic encryption might work? As I understand it, this would be unreasonable slow for what most people expect from fingerprints. Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss