Hi! I was looking at bug #3508648 and it turns out that it segfaults in expandKekulize() due to stack overflow. The routine in question uses recursive algorithm to assign bond orders for aromatic systems. For large molecules (1bxr has almost 50k atoms) it means that the stack can grow quite a bit because of the recursive function calls and local variables.
At first I started by eliminating some of the local variables which are not actually used for anything but getting indexes for atoms in the bond: // Get the bond and its atoms - OBBond *bond = mol->GetBond(bond_idx); - OBAtom *atom1 = bond->GetBeginAtom(); - OBAtom *atom2 = bond->GetEndAtom(); - int idx1 = atom1->GetIdx(); - int idx2 = atom2->GetIdx(); + size_t idx1 = mol->GetBond(bond_idx)->GetBeginAtom()->GetIdx(); + size_t idx2 = mol->GetBond(bond_idx)->GetEndAtom()->GetIdx(); Still crashed for default max stack size. I'm not sure if this kind of optimization is useful at all since compiler should realize that bond, atom1 and atom2 is used only to get atom indexes. On the other hand we are asking to create variables on the stack and once they are there, they stay around until they go out of the scope (which is basically at the end of kekulization). I'm not sure which of these is true. Any comments on this? Now I got to the real culprits of this issue - previousState and previousBondState. They are vectors of integers holding the state of each atom and bond in molecule at the beginning of each iteration. Putting them on heap allows to process 1bxr.pdb without increasing stack size limits. // Remember the current state so that we can backtrack if the attempt fails - vector<int> previousState = atomState; // Backup the atom states - vector<int> previousBondState = bondState; // ... and the bond states + // Keep previous states on heap to avoid stack overflows for large molecules + vector<int>* ppreviousState = new vector<int>(atomState); // Backup the atom states + vector<int>* ppreviousBondState = new vector<int>(bondState); // ... and the bond states And deleting them just before expandKekulize() returns. Performance wise there is no significant difference between previous version and this when looking at times required to convert 1bxr or fullerene with babel. Since state vectors can be quite large, I think it is appropriate to keep them in heap. An alternative could be to store only differences in previous state, that could be a STL pair for bond and the two atoms, which should be considerably smaller than full state vectors. This solution allows to process a lot larger molecules as before, but still somewhere we will hit the stack size limit because of the recursive algorithm. Making smarter/iterative algorithm could allow to work around this (or just raising the stack size limit). But I'm not sure if this would be useful to anyone at the moment. I don't know of a way to get the stack size limit from C++ program, but that would allow for a meaningful error message instead of just segfault. Any suggestions, comments? I'm committing this as is if there are no objections. Reinis ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel