Thanks for this writeup Rocco. You're right that there's not an easy to find and understand collection of this information. That's one of those gaps in the documentation that I should eventually address. This is already a pretty good start though.
-greg On Thu, Sep 8, 2016 at 9:37 PM, Rocco Moretti <rmoretti...@gmail.com> wrote: > Greg can correct me if I'm wrong(1), but in RDKit there's actually three > "levels" of hydrogens: > > * "Physical" hydrogens, which are represented as actual, independent atoms > in the atom graph. ("Physical hydrogens" is what I'm calling them - I don't > know if RDKit has an official term for them.) > > * "Explicit" hydrogens, which are represented as a numeric annotation on > their attached heavy atom. (And *not* as a separate atom object.) > > * "Implicit" hydrogens, which aren't actually represented anywhere, but > are calculated from the standard valence of the heavy atom, and how many > are occupied by actual atoms and explicit hydrogens. > > Generally, except for some coordinate calculations, RDKit seems to be > built around working with molecules with explicit or implicit hydrogens. > This is why when you read in a molecule, RDKit normally removes any > physical hydrogens. (Note that for most file reading code there's a > removeHs parameter you can set to False to change this behavior, and read > explicitly listed hydrogens as physical hydrogens.) > > By default "removing hydrogens" means turning them into implicit > hydrogens(2), but the RemoveHs() function has an "updateExplicitCount" > parameter which will cause the removed hydrogens to be turned into explicit > hydrogens instead. The standard MOL file loading code doesn't use this > option, though, so the hydrogens in the molecule are usually converted into > implicit when you read things in. > > AddHs(), of course, turns explicit and implicit hydrogens into physical > hydrogens. (Though the "explicitOnly" parameter can be used to control > this.) It does annotate whether these physical hydrogens came from either > the implicit or explicit pool, so you can round trip things through AddHs() > and RemoveHs() appropriately. (There's also a "implicitOnly" parameter on > RemoveHs() which will only remove those hydrogens.) > > Regards, > -Rocco > > (1) I don't think the RDKit hydrogen model has ever been formalized in one > place for user-facing documentation, so this is the understanding I've > gotten from banging my head against various hydrogen-related issues. > > (2) There's special complications here that there are certain structures, > such as imidazole, which needs physical or explicit hydrogens on one of the > nitrogens in order to Kekulize properly. If you're implicit only, the RDKit > sanitizer will choke. Thus, there's special casing in various Add/RemoveHs > function to avoid implicit-izing these critical hydrogens. > > On Thu, Sep 8, 2016 at 1:46 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> > wrote: > >> On 09/08/2016 10:25 AM, Greg Landrum wrote: >> ... >> > Why do you want 2D drawings that include H atoms? >> >> On the subject of H atoms: when I read in the MOL file that has them, I >> need to explicitly call AddHs() in order to have them drawn. >> >> Question: do they actually get stripped off by the reader and re-added >> by AddHs()? Or are they there "hidden" somehow and AddHs() just >> "unhides" them? >> >> TIA >> -- >> Dimitri Maziuk >> Programmer/sysadmin >> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu >> >> >> ------------------------------------------------------------ >> ------------------ >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss