[Adding the mailing list back on] On Mon, Dec 2, 2019 at 3:23 PM Thomas Evangelidis <teva...@gmail.com> wrote:
> > Thank you for the corrections and the explanation! As stated in my > original email, I want to add extra atomic properties, like the partial > charges, as atom invariants and assess whether they are beneficial in terms > of performance. > But if you add partial charges (a floating point number) then essentially every atom is going to end up with its own invariant. That's unlikely to end well. > Therefore, I wanted first to re-implement the standard invariants in > Python in order to have a reference point for comparison later. > Ok, that makes sense. > Your corrections improved the overall agreement, but as you pointed out, > it is not complete. Although with the two example molecules the > substructures and hence the number of 'on' bits are the same, on a large > scale, namely generating fingerprint to train an ML model, the number of > invariant bits is 2166 using the original ECFP atom invariants, while with > user-defined invariants are 2171. In terms of performance, the original is > marginally better. > I'm going to guess, and without info on which molecules are generating different numbers of bits that's all I can do, that this is a result of the different hashing schemes for the atom invariants. If you really want to track down what's going on, you'll have to figure out which molecules are different and share those. > Btw, why did you use > > if(ring_info.NumAtomRings(i)): > descriptors.append(1) > > and not > > descriptors.append(a.IsInRing()) > > ? Your 2 lines do not add any value to the 'descriptors' list if the atom > does not belong to a ring. Is this how it is in the original implementation? > Yeah, I just copied what's in the original implementation. The results should be exactly the same. -greg > >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss