Hi Dimitri,
First a simple operational point: it's cleaner to use
Chem.Descriptors.MolWt than Chem.rdMolDescriptors._CalcMolWt:
In [6]: from rdkit.Chem import Descriptors
In [7]: m = Chem.MolFromSmiles('CCO')
In [8]: Descriptors.MolWt(m)
Out[8]: 46.069
The code that is called in the back is the same. Functions/variables whose
names begin with an underscore are usually not intended for use in client
code.
On Fri, Dec 13, 2013 at 12:39 AM, Dimitri Maziuk <[email protected]>wrote:
> On 12/12/2013 05:00 PM, David Hall wrote:
> > Looking at
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/atomic_data.cpp
> , the sixth column has the masses to the third decimal place.
> > Using these,
> > 7*1.008+3*12.011+14.007+2*15.999 =
> > 89.094
> >
> >
> > So, just a matter of where one rounds. Given the uncertainties in
> http://iupac.org/publications/pac/78/11/2051/ , I would argue that you
> are concerned with precision at a level beyond what is currently accepted.
>
> I'm told we have mass spec people using our database and they care.
>
If the numbers are being used for the interpretation of mass spec results,
you almost certainly should be using exact masses, not the average
molecular weight:
In [9]: Descriptors.ExactMolWt(m)
Out[9]: 46.041864812
The ExactMolWt calculation uses the IUPAC atomic mass for the most common
isotope of each atom unless you actually specify the isotope and allows
accurate calculations with more significant figures:
In [11]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[12CH3][12CH2][16OH]'))
Out[11]: 46.041864812
In [12]: Descriptors.ExactMolWt(Chem.MolFromSmiles('[13CH3][12CH2][16OH]'))
Out[12]: 47.045219652
(Unspecified Hs are isotope 1 in these calculations)
The RDKit isotope masses are from the BODR, which takes the data from IUPAC.
Besides, if the Cs are 12.011 vs 12.0107, then it all takes is 17
> carbons to push you over .005 -- so for molecules with a couple of
> hundred atoms you might as well round to the nearest integer.
You're absolutely correct. It turns out that this reflects reality if you
are using average molecular weights.
The RDKit values are intended to be equal to those recommended by IUPAC (
http://iupac.org/publications/analytical_compendium/Cha01sec8.pdf) which
reports them to 5 sig figs when appropriate. Those values have an
uncertainly of +/- 1 in the last digit. The value of 12.0107 that you quote
above has an uncertainty of +/-8 in the last digit (the uncertainties are
in the report that David references).
> But anyway, thanks -- I was looking for the source of the numbers and
> didn't find any references. There should probably be a pointer to
> atomic_data.cpp somewhere in the docs.
>
Good point; there should be some indication where the data came from.
Do you think it makes more sense to point to the C code or to include a
pointer to the source of the data (in this case that IUPAC table)?
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
-greg
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss