On Mon, Jun 21, 2010 at 6:43 PM, Craig A. James <cja...@emolecules.com> wrote: > The recent discussion of SSSR bugs prompted me to dig back through my emails > to one I wrote on 20 November 2007 to the BlueObelisk mailing list. Here it > is in its entirety. > > Craig > > -------------------------------- > > > Andrew suggested an algorithmic description of aromaticity. I think this is > a really good idea. > > I propose, for argument's sake, that we should be using a LSSR, "Largest Set > of Smallest Rings." It would go like this: A breadth-first search for rings, > which terminates when all ring atoms have been included in at least one ring, > but only after ALL cyclic paths of a given size have been enumerated. Since > the LSSR is computed specifically for aromaticity detection, it would stop at > some reasonable size, say 8-atom rings. > > For example, cubane's LSSR would have six (not five or four) 4-atom rings, > because when you get the the four-atom-ring step of the algorithm, you don't > stop until you've added all four-atom cyclic paths. Similarly, the LSSR for > a prism (three square faces joined in a triangle) would have five rings (two > 3-atom rings and three 4-atom rings, whereas the SSSR for that structure > would only have four rings).
Implementing this shouldn't be to hard. Changing the last loop in OBRingSearch should be all. It currently checks if a ring is contained in the union of all other rings. Checking with a single ring should give the behavior you describe (assuming we don't check frj). The SSSR should still be available though. Is there a reference for the LSSR or is it a suggestion for a name? If there is no reference there might be some unforeseen cases. For example, in cubane there are 3 6-atom-rings not fully overlapping with a single 4-atom-ring. How do we decide if the ring should not go in the set? (note: if you check all 4-rings, the 6th 4-ring wouldn't be found) This may not be important for aromaticity though. > Unlike SSSR, this is a very easy algorithm to code, and easy to explain. It's > computation time is O(R) where R is the number of rings, and there is no > arbitrariness to the LSSR; no matter how you compute it, you always get the > same answer. > > For most "ordinary" molecules, the LSSR is the same as the SSSR. The LSSR > and SSSR only differ with "cage" structures, where the rings themselves form > rings, i.e. when there exists a set of R rings that "covers" all atoms, but R > is fewer than the number of bonds that must be broken to make the structure > acyclic. And in these cases, the choice of rings for the SSSR is arbitrary, > whereas the LSSR is not. > > For aromaticity detection, this gives you an algorithmic description of rings > that is very suitable for the aromaticity definition. It would go something > like this: > > Compute the LSSR > Sort the LSSR list smallest-to-largest > for each atom in the molecule that isn't marked "aromatic" yet { > for each ring in the LSSR (ordered smallest to biggest) that atom is in { > if that ring is aromatic { > mark all of its atoms "aromatic" > } > else { > for each fused neighbor ring in the LSSR { > if the two rings together are aromatic { > mark all atoms in the two rings aromatic > } > else { > for each ring of the LSSR fused to either of the pair { > if the three rings together are aromatic { > mark all atoms in all three rings aromatic > } > } > } > } > } > } > > > I think that's about it. If I understand the chemistry, you never have to go > to four rings (more than 14 atoms) to discover aromaticity. Or we could just > decide that was the OpenSMILES rule, even if there are obscure, rare cases > for which it isn't true. > > In extreme cases (fullerenes), this algorithm is still fast, O(nrings), > because you stop at three rings. Something like 760 steps per ring in a > fullerene, but it's still linear, not polynomial or exponential. And it's > only something like 760 steps if the thing turns out to be completely > non-aromatic. In a real fullerene, it would decide all the atoms were > aromatic in the first pass of the outer loop, and be finished. > > Craig > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > OpenBabel-Devel mailing list > OpenBabel-Devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/openbabel-devel > ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel