Hi!

I was looking at bug #3508648 and it turns out that it segfaults in
expandKekulize() due to stack overflow. The routine in question uses
recursive algorithm to assign bond orders for aromatic systems. For
large molecules (1bxr has almost 50k atoms) it means that the stack can
grow quite a bit because of the recursive function calls and local
variables.

At first I started by eliminating some of the local variables which are
not actually used for anything but getting indexes for atoms in the
bond:
     // Get the bond and its atoms
-    OBBond *bond = mol->GetBond(bond_idx);
-    OBAtom *atom1 = bond->GetBeginAtom();
-    OBAtom *atom2 = bond->GetEndAtom();
-    int idx1 = atom1->GetIdx();
-    int idx2 = atom2->GetIdx();
+    size_t idx1 = mol->GetBond(bond_idx)->GetBeginAtom()->GetIdx();
+    size_t idx2 = mol->GetBond(bond_idx)->GetEndAtom()->GetIdx();

Still crashed for default max stack size. I'm not sure if this kind of
optimization is useful at all since compiler should realize that bond,
atom1 and atom2 is used only to get atom indexes. On the other hand we
are asking to create variables on the stack and once they are there,
they stay around until they go out of the scope (which is basically at
the end of kekulization). I'm not sure which of these is true.

Any comments on this?

Now I got to the real culprits of this issue - previousState and
previousBondState. They are vectors of integers holding the state of
each atom and bond in molecule at the beginning of each iteration.
Putting them on heap allows to process 1bxr.pdb without increasing stack
size limits.

     // Remember the current state so that we can backtrack if the attempt fails
-    vector<int> previousState         = atomState;     // Backup the atom 
states
-    vector<int> previousBondState     = bondState;     // ... and the bond 
states
+    // Keep previous states on heap to avoid stack overflows for large 
molecules
+    vector<int>* ppreviousState     = new vector<int>(atomState);      // 
Backup the atom states
+    vector<int>* ppreviousBondState = new vector<int>(bondState);      // ... 
and the bond states

And deleting them just before expandKekulize() returns.

Performance wise there is no significant difference between previous
version and this when looking at times required to convert 1bxr or
fullerene with babel. Since state vectors can be quite large, I think it
is appropriate to keep them in heap. An alternative could be to store
only differences in previous state, that could be a STL pair for bond
and the two atoms, which should be considerably smaller than full state
vectors.

This solution allows to process a lot larger molecules as before, but
still somewhere we will hit the stack size limit because of the
recursive algorithm. Making smarter/iterative algorithm could allow to
work around this (or just raising the stack size limit). But I'm not
sure if this would be useful to anyone at the moment.

I don't know of a way to get the stack size limit from C++ program, but
that would allow for a meaningful error message instead of just
segfault.

Any suggestions, comments? I'm committing this as is if there are no
objections.


Reinis


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to