On 19/11/2019 03:34, Benjamin Datko wrote:
Hello all,

I am curious on how to fold a count vector fingerprint. I understand
when folding bit vectors the most common way is to split the vector in
half, and apply a bitwise OR operation. I think this is how the
function rdkit.DataStructs.FoldFingerprint works in RDKit, correct me
if I am wrong.

How does RDKit and or what is the appropriate way to fold count
vectors such as AtomPair, Morgan, and Topological torsion?

Can you give us some context? Why do you want to do that?

Maybe, you can use the following in order to create
shorter "fingerprints" for which the Tanimoto distance is
still computable (despite becoming approximate then):

---
Shrivastava, A. (2016).
Simple and efficient weighted minwise hashing.
In Advances in Neural Information Processing Systems (pp. 1498-1506).

https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf
---

Regards,
F.

I thought about turning the fingerprint into a bit vector using their
respected "AsBitVect" method then folding using
rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't
have a "AsBitVect" method
[https://www.rdkit.org/docs/GettingStartedInPython.html].

For an explicit example using AtomPair fingerprint we can see the
fingerprint is extremely sparse. Could this AtomPair fingerprint be
folded to increase the density?

from rdkit import Chem

from rdkit.Chem import AllChem

mol = Chem.MolFromSmiles('CC1CCCCC1')
ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1,
maxLength=3)

number_of_nonzero_elements =
len(ap_fp.GetNonzeroElements().values())

print((ap_fp.GetLength(),number_of_nonzero_elements))
(8388608,9)

Very Respectfully,

Ben
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to