Hi Thomas,

I didn't know how invariants are computed off the top of my head, so I just looked into the code.

If no invariants are provided by the user, the invariants are computed by the getConnectivityInvariants() function (FingerprintUtil.cpp):

void getConnectivityInvariants(const ROMol &mol, std::vector<uint32_t> &invars,
                               bool includeRingMembership) {
  unsigned int nAtoms = mol.getNumAtoms();
  PRECONDITION(invars.size() >= nAtoms, "vector too small");
  gboost::hash<std::vector<uint32_t>> vectHasher;
  for (unsigned int i = 0; i < nAtoms; ++i) {
    Atom const *atom = mol.getAtomWithIdx(i);
    std::vector<uint32_t> components;
    components.push_back(atom->getAtomicNum());
    components.push_back(atom->getTotalDegree());
    components.push_back(atom->getTotalNumHs());
    components.push_back(atom->getFormalCharge());
    int deltaMass = static_cast<int>(
        atom->getMass() -
PeriodicTable::getTable()->getAtomicWeight(atom->getAtomicNum()));
    components.push_back(deltaMass);

    if (includeRingMembership &&
atom->getOwningMol().getRingInfo()->numAtomRings(atom->getIdx())) {
      components.push_back(1);
    }
    invars[i] = vectHasher(components);
  }
}

For each atom, tha invariant has 5 components:

* atom number
* total degree
* total number of hydrogens
* formal charge
* mass difference between the mass of your isotope and the average atomic weight of the element (this will be 0 if you are not using a specific isotope) On top of these 5 components, if includeRingMembership is true (the default) and the atom is part of one or more rings, there will be a 6th component equal to the number of rings that the atom belongs to.

This 5 (or 6)-component vector is then hashed.

If the invariants are provided by the user, they will be used instead.

Cheers,
p.

On 27/11/2019 15:30, Thomas Evangelidis wrote:
This is my 3rd attempt to get an explanation about how these invariants work in the ECFP fingerprint cause I can't find it anywhere in the documentation. I tried the generateAtomInvariant() [see below] and the resulting ECFP bit-vectors had for the same molecules drastically reduced variance, 2360 variant bits without invariants versus 795 with the invariants. Surprisingly, the performance of the ECFP with invariants was better in this dataset in terms of affinity ranking. Can someone please explain what happens when I pass invariants to the AllChem.GetMorganFingerprint() function??? I hope that I will get an answer this time.


        def generateAtomInvariant(mol):
             """ >>> generateAtomInvariant(Chem.MolFromSmiles("Cc1ncccc1"))
        [341294046, 3184205312, 522345510, 1545984525, 1545984525,
        1545984525, 1545984525] """ num_atoms = mol.GetNumAtoms()
             invariants = [0]*num_atoms
             for i,ain enumerate(mol.GetAtoms()):
                 descriptors=[]
                 descriptors.append(a.GetAtomicNum())
                 descriptors.append(a.GetTotalDegree())
                 descriptors.append(a.GetTotalNumHs())
                 descriptors.append(a.IsInRing())
                 descriptors.append(a.GetIsAromatic())
                 invariants[i]=hash(tuple(descriptors))&0xffffffff return 
invariants

        And then generate the fingerprint like this:

            fp = AllChem.GetMorganFingerprint(mol, radius=3,
            invariants=generateAtomInvariant(mol))



--

======================================================================

Dr. Thomas Evangelidis

Research Scientist

IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague, Czech Republic

  &
CEITEC - Central European Institute of Technology <https://www.ceitec.eu/>, Brno, Czech Republic

email: teva...@gmail.com <mailto:teva...@gmail.com>, Twitter: tevangelidis <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis <https://www.linkedin.com/in/thomas-evangelidis-495b45125/>

website: https://sites.google.com/site/thomasevangelidishomepage/






_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to