On 11 October 2010 21:18, Chris Morley <[email protected]> wrote: > On 11/10/2010 20:21, Noel O'Boyle wrote: >> >> I went through a large dataset of PubChem 3D structures looking for >> implicit H failures (removing, then adding Hs). 1/3 of the failures >> are due to the following in atomtyp.txt: >> >> INTHYB [$([#6]([#8D1])[#8D1])] 2 #sp2 carbon >> >> This makes any C attached to two Os turn into sp2...even in geminal >> diols , for example, ClC(Cl)(Cl)C(O)O. According to wikipedia, these >> don't tend to last long but they are in PubChem (whether errors or >> not, I can't say). >> >> If it's commented out, then it works fine for these cases. >> >> Is there any reason for this rule (it seems to date from the early >> days)? Perhaps it's to correct ligand structures from the PDB where >> all examples of this indicate COO-? If so, maybe the PDB cases are >> better handled in the code using the molecular geometry...? > > I have always regarded the implicit valency model as unsatisfactory, and > maybe it has become a bit messed up over the years. Is it currently used for > anything other than recognizing when a hydrogen could or should be added to > an atom?
But this is already a substantial part, isn't it, as it determines whether you can read a SMILES string correctly. In general it's working quite well - there's a whole lot of patterns required to handle nitrogens though. > A simpler and more obvious model for this purpose has essentially a single > IMPVAL for each charge state of the molecule. (Only if you are interested in > radicals or hydrogen on the higher valency states of second row elements do > you need another rule for each higher valence.) There is no need for any > skilful fine tuning. It is more maintainable and will be faster because not > so many SMARTS patterns need to be matched. Up to now, It has worked for > everything I've tried (although this not very extensive), except with > test_formula, where the fault is in a couple of erroneous results in > formularesults.txt, which at least shows the old model was error-prone and > needs some more tweaking of phosphate structures. > > There may be other side effects I'm not aware of. Just before a release is > not a good time to commit something like this (5 years ago would have been > better), so I've just attached a patch (changes to 11 code lines), if you > want to try it. I'll check it out...although I'm cautious also. > Chris > > ------------------------------------------------------------------------------ > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today. > http://p.sf.net/sfu/beautyoftheweb > _______________________________________________ > OpenBabel-Devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/openbabel-devel > > ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ OpenBabel-Devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openbabel-devel
