On Feb 7, 2017, at 01:17, Curt Fischer <[email protected]> wrote:
> I am confused by this behavior:
>
> >>> labeled_etoh = Chem.MolFromSmiles('C[13C]O')
> >>> print(Chem.MolToSmiles(labeled_etoh))
>
> C[C]O
>
> >>> print(Chem.MolToSmiles(labeled_etoh, isomericSmiles=True))
>
> C[13C]O
>
> 1. Why are there any brackets at all in the first output? Why not just 'CCO'?
The middle atom in "CCO" has two hydrogens. The middle atom in "C[C]O" has no
hydrogens.
> 2. Is there any documentation anywhere that the "isomericSmiles" argument is
> also an "isotopicSmiles" argument?
I don't believe so. A search via DuckDuckGo of rdkit.org finds only two
irrelevant matches.
> I am also confused about when Chem.MolToSmiles() puts in H atoms in the
> output.
SMILES has a short-hand notation to represent hydrogens. "[CH4]" and "C" are
both methane.
When atom is described using brackets then the number of hydrogens must be
specified with the H<n> notation.
When an atom is described without brackets then the number of hydrogens is
based on the permitted valence values. C has a valence of 4, -C- has two single
bonds, so the middle carbon of CCO has two hydrogen bonds to complete the
valence.
The output mechanism prefers to use the short-hand notation if possible. That
isn't possible if the sum of hydrogens and bond types is different than one of
the valence levels, or if there is an isotope, charge, chiral, etc., which
requires the use of []s.
>
> >>> three_hb1 = Chem.MolFromSmiles('C[13CH](O)C[13C](=O)O')
> >>> three_hb2 = Chem.MolFromSmiles('C[13C](O)C[13C](=O)O')
> >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=True))
>
> C[13CH](O)C[13C](=O)O
>
> >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=True))
>
> C[13C](O)C[13C](=O)O
>
> >>> print(Chem.MolToSmiles(three_hb1, isomericSmiles=False))
>
> CC(O)CC(=O)O
>
> >>> print(Chem.MolToSmiles(three_hb2, isomericSmiles=False))
>
> C[C](O)CC(=O)O
>
> 3. Why are there no brackets for three_hb1 output, but there are for
> three_hb2?
I think you mean "for the isomericSmiles=False" output? The first three_hb1
output has brackets.
The isotope notation requires []s, so the option of using the short-hand
notation doesn't exist. In that case the number of hydrogens must be specified
as otherwise it means the atom has no hydrogens.
> 4. As far as I can tell, the two three_hb molecules are identical. Why
> aren't all Hs removed during canonicalization?
The second atom in three_hb1 has 1 hydrogen and three single bonds.
The second atom in three_hb2 has 0 hydrogens and three single bonds.
They are different structures so have different SMILES.
Cheers,
Andrew
[email protected]
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss