Hi Jeff,
What you say is theoretically correct, in that it is probably not possible
to go from the fingerprint directly to a structure. However, it is possible
to generate structures and rapidly compare them to the target fingerprint.
The fingerprints are of course able to tell you how close your structure is
to the target fingerprint in a way that can drive an optimisation
algorithm. Chemistry adds strong constraints to what structures are
possible, which reduces the search space dramatically and if you know it’s
a “drug-like” molecule you’re looking for, even more so.
People forget that Daylight originally developed fingerprints to speed up
substructural searching of databases. A structure can only be a
substructure of another molecule if all the bits it sets are also in the
other molecule. They are specifically designed to encode the molecular
structure, and that’s why a GA can be successful. As Peter says, the same
fingerprint can be generated for different molecules, but this will be rare
if the fingerprint is well designed. Try it on Chembl with an RDKit
fingerprint and I’ll be surprised if you get more than 10 pairs that aren’t
isomers of each other or something trivial like that.
Regards,
Dave

On Fri, 20 Apr 2018 at 18:49, Peter S. Shenkin <shen...@gmail.com> wrote:

> Well, @jeff, there's no law saying that hashes must collide, and in fact
> some are designed to make collision extremely unlikely (can you say
> "SHA-2"?). But the ones in question here do collide relatively frequently,
> for at least some molecular fingerprint types.
>
> An interesting question (maybe only to me :-) ) would be how similar, in
> general, the structures are that exhibit identical fingerprints, for the
> well-known fingerprint types, for various fingerprint lengths. A
> sufficiently complicated molecule will give lots of on bits, and for (say)
> a 64-fit fingerprint, there can only be 64 possible fingerprints with all
> but one bit turned on.
>
> I realize that most fingerprints in common use today are longer than this,
> but still, looking back at 64- and 32-bit fingerprints with all but one
> bits on might give some insight. How short does a fingerprint of some
> particular type have to be for, say, 10% of CHEMBL molecules to exhibit an
> all-on pattern? How short does it have to be for, say, 10% of CHEMBL
> molecules to have an exact fingerprint match with some other molecule?
>
> -P
>
> On Fri, Apr 20, 2018 at 1:03 PM, jeff godden <jgod...@gmail.com> wrote:
>
>> Long ago molecular fingerprints were referred to in the literature as
>> molecular hash functions. (y'know, those crazy mathematical algorithms
>> which permitted rapid lookup of some string in a lookup table)  As such, we
>> expected for their to be the associated hash collisions  (
>> https://en.wikipedia.org/wiki/Hash_table#Collision_resolution ).  All
>> this by way of saying that to go from fingerprint to the molecular
>> structure which produced it is traditionally impossible unless the
>> fingerprint no longer amounts to a hash(ing) function.
>> --
>> j
>>
>>
>> On Fri, Apr 20, 2018 at 9:56 AM, Peter S. Shenkin <shen...@gmail.com>
>> wrote:
>>
>>> Isn't it the case that more than one molecule can share an identical
>>> fingerprint? (Depending on the specific fingerprint.) Think p-biphenyl,
>>> extended to triphenyl, tetraphenyl, etc. Still, a GA or SA method could
>>> keep going and come up with multiple matches, plus multiple near-misses.
>>>
>>> -P.
>>>
>>> On Fri, Apr 20, 2018 at 10:58 AM, David Cosgrove <
>>> davidacosgrov...@gmail.com> wrote:
>>>
>>>> Hi Brian,
>>>> Dave Weininger once showed a fairly simple GA that could generally
>>>> deduce a structure from a daylight fingerprint by using SMILES strings as
>>>> the chromosomes and tanimoto distance to the target fingerprint as the
>>>> fitness function.  He may have done a talk about it for MUG or conceivably
>>>> written it up. It’d be in JCICS if so, I expect.
>>>>
>>>> You could probably knock up a script to do that in a couple of hours I
>>>> would think using a GA library to do the mechanics. If you’re not worried
>>>> about high efficiency, you don’t need to do anything fancy with mutation
>>>> and crossover of the SMILES strings to ensure you always get a valid
>>>> molecule, you can just give a fitness of 0 if the SMILES parser doesn’t
>>>> like what you give it.
>>>> HTH,
>>>> Dave
>>>>
>>>>
>>>> On Fri, 20 Apr 2018 at 14:45, Nils Weskamp <nils.wesk...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Brian,
>>>>>
>>>>> in general, it might be difficult to come up with a deterministic
>>>>> algorithm that generates exactly one structure for a given fingerprint due
>>>>> to many ambiguities in the process. If you are happy with a more "fuzzy"
>>>>> (approximate / probabilistic) approach, you might want to take a look at
>>>>>
>>>>> https://pubs.acs.org/doi/abs/10.1021/ci600383v
>>>>> https://link.springer.com/article/10.1007/s10822-005-9020-4
>>>>>
>>>>> Given this task, I would probably start with a large database of known
>>>>> compounds (PubChem, UniChem, GDB17), calculate fingerprints and then do a
>>>>> similarity search with my query fingerprint.
>>>>>
>>>>> Hope this helps,
>>>>> Nils
>>>>>
>>>>>
>>>>> On Fri, Apr 20, 2018 at 3:13 PM, Brian Cole <col...@gmail.com> wrote:
>>>>>
>>>>>> Hi Chem-informaticians:
>>>>>>
>>>>>> I know it has been talked about in the community that fingerprints
>>>>>> are not a way to obfuscate molecules for security, but I don't recall a
>>>>>> paper actually demonstrating actual reverse engineering a fingerprint 
>>>>>> into
>>>>>> a chemical structure. Does anyone know if such a paper exists?
>>>>>>
>>>>>> Code using RDKit to demonstrate the functionality would be an obvious
>>>>>> bonus as well. :-)
>>>>>>
>>>>>> Thanks,
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> Rdkit-discuss mailing list
>>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> Rdkit-discuss mailing list
>>>>> Rdkit-discuss@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>>
>>>> --
>>>> David Cosgrove
>>>> Freelance computational chemistry and chemoinformatics developer
>>>> http://cozchemix.co.uk
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
> --
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to