Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
It looks like it should be deterministic, in that it always loops through the existing non-hydrogen atoms in their internal order, adding H's to each in turn. https://github.com/rdkit/rdkit/blob/ffc123a6659705adae33a6f5bf3913d65aa7b54d/Code/GraphMol/AddHs.cpp Steve On Wed, 3 Oct 2018 at 21:23, Peter St. John wrote: > Ah, well I suppose the follow up question is then does 'AddHs' add > hydrogens in a deterministic fashion? > If I have a canonicalized SMILES and do > > mol = Chem.MolFromSmiles(SMILES) > molH = Chem.AddHs(mol) > > and then store information about the bonds in molH, should those be > relatively consistent if I run the same code later? > My limited experiments seem to indicate they are, but I'm not sure if that > persists across python sessions or different hardware. > > Thanks again! > -- Peter > > > On Wed, Oct 3, 2018 at 9:53 AM Dmitri Maziuk via Rdkit-discuss < > rdkit-discuss@lists.sourceforge.net> wrote: > >> On Wed, 3 Oct 2018 17:26:24 +0200 >> Greg Landrum wrote: >> >> > Yep good point. >> > Though you can opt to keep the Hs if you want, that is not the default >> > behavior. >> >> ;) I work for NMR people, we get very attached to our protons. >> >> Seriously though, I forget whether it was rdkit or openbabel, but back >> when I was testing them I managed to read L-alanine MOL in and get >> D-alanine InChI string out in one of them. >> -- >> Dmitri Maziuk >> >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
On 10/03/2018 03:23 PM, Peter St. John wrote: > Ah, well I suppose the follow up question is then does 'AddHs' add > hydrogens in a deterministic fashion? It should, what's not guaranteed is that it will be the right order. Obviously, if (using my previous example) L- and D-alanine is the "same molecule" for your purposes, then it doesn't matter. If it does mater, then alatis (the link I sent earlier) is the best option that I know of. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
Ah, well I suppose the follow up question is then does 'AddHs' add hydrogens in a deterministic fashion? If I have a canonicalized SMILES and do mol = Chem.MolFromSmiles(SMILES) molH = Chem.AddHs(mol) and then store information about the bonds in molH, should those be relatively consistent if I run the same code later? My limited experiments seem to indicate they are, but I'm not sure if that persists across python sessions or different hardware. Thanks again! -- Peter On Wed, Oct 3, 2018 at 9:53 AM Dmitri Maziuk via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > On Wed, 3 Oct 2018 17:26:24 +0200 > Greg Landrum wrote: > > > Yep good point. > > Though you can opt to keep the Hs if you want, that is not the default > > behavior. > > ;) I work for NMR people, we get very attached to our protons. > > Seriously though, I forget whether it was rdkit or openbabel, but back > when I was testing them I managed to read L-alanine MOL in and get > D-alanine InChI string out in one of them. > -- > Dmitri Maziuk > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
On Wed, 3 Oct 2018 17:26:24 +0200 Greg Landrum wrote: > Yep good point. > Though you can opt to keep the Hs if you want, that is not the default > behavior. ;) I work for NMR people, we get very attached to our protons. Seriously though, I forget whether it was rdkit or openbabel, but back when I was testing them I managed to read L-alanine MOL in and get D-alanine InChI string out in one of them. -- Dmitri Maziuk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
Yep good point. Though you can opt to keep the Hs if you want, that is not the default behavior. On Wed, 3 Oct 2018 at 17:07, Dmitri Maziuk via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > On Wed, 3 Oct 2018 06:21:06 +0200 > Greg Landrum wrote: > > > The atom ordering in the RDKit molecule created from a SMILES or Mol > block > > will always be the same and will corresponds to the ordering of the atoms > > in the input > > ... provided your molecule has no protons and/or you don't removeH/addH in > the process. > > -- > Dmitri Maziuk > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
On Wed, 3 Oct 2018 06:21:06 +0200 Greg Landrum wrote: > The atom ordering in the RDKit molecule created from a SMILES or Mol block > will always be the same and will corresponds to the ordering of the atoms > in the input ... provided your molecule has no protons and/or you don't removeH/addH in the process. -- Dmitri Maziuk ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
On Tue, Oct 2, 2018 at 10:32 PM Peter St. John wrote: > If I store a molecule as a SMILES string, along with relevant information > about different bonds, is it safe to annotate those bond entries by bond > index? > This has already been answered (yes, you can), but to just provide a bit more detail: The ordering of atoms/bonds in the RDKit molecule that results from reading a particular input is deterministic. The atom ordering in the RDKit molecule created from a SMILES or Mol block will always be the same and will corresponds to the ordering of the atoms in the input. The bond ordering in the RDKit molecule created from a SMILES or Mol block will always be the same. The ordering of bonds from a Mol file corresponds to the ordering in the input file. The ordering from SMILES is a bit more complicated.[1] For the sake of completeness: the ordering of atoms and bonds from any of the other currently supported input types will also always be the same. -greg [1] Here's the story on the ordering of bonds read from SMILES: - non "ring closure" bonds appear in the order in which they appear in the input SMILES. - "ring closure" bonds appear at the end of the set of bonds. their ordering is non-trivial to describe, but it is deterministic. Here's a relatively simple example demonstrating this: In [6]: m = Chem.MolFromSmiles('C12ON1.F2') In [7]: m.Debug() Atoms: 0 6 C chg: 0 deg: 3 exp: 3 imp: 1 hyb: 4 arom?: 0 chi: 0 1 8 O chg: 0 deg: 2 exp: 2 imp: 0 hyb: 4 arom?: 0 chi: 0 2 7 N chg: 0 deg: 2 exp: 2 imp: 1 hyb: 4 arom?: 0 chi: 0 3 9 F chg: 0 deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0 Bonds: 0 0->1 order: 1 conj?: 0 aromatic?: 0 1 1->2 order: 1 conj?: 0 aromatic?: 0 2 2->0 order: 1 conj?: 0 aromatic?: 0 3 3->0 order: 1 conj?: 0 aromatic?: 0 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
On 10/02/2018 03:32 PM, Peter St. John wrote: > I.e., if I create a new rdkit Molecule with rdkit.Chem.MolFromSmiles(xxx), > will the bond ordering always be the same? If not, does anyone know a a > robust way of specifying a bond within a molecule as a string-based > representation? https://www.nature.com/articles/sdata201773 -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
Hi Peter and Nils, To supplement Nils comment I'd like to add that during writing the Mol atoms nor bonds order is not changed, but the canonical atom mapping is saved in molecular property "_smilesAtomOutputOrder". This does not include bonds though, it shouldn't change, but if you wish to be safe it is best to save the two atom indices instead the bond idx itself. Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl wt., 2 paź 2018 o 22:57 Nils Weskamp napisał(a): > Hi Peter, > > to the best of my knowledge: for a given SMILES string, you should > always end up with the same molecule object. > > On the other hand, generation of (canonical / unique) SMILES often > reorders atoms and bonds (to ensure that the SMILES is unique for a > given structure). A conversion Molecule -> SMILES -> Molecule could thus > lead to a different ordering of atoms and bonds and you will have to > canonicalize your structure before you generate your index. [Or make > sure that you use non-canonical SMILES.] > > Best, > Nils > > Am 02.10.2018 um 22:32 schrieb Peter St. John: > > If I store a molecule as a SMILES string, along with relevant > > information about different bonds, is it safe to annotate those bond > > entries by bond index? > > > > I.e., if I create a new rdkit Molecule with > > rdkit.Chem.MolFromSmiles(xxx), will the bond ordering always be the > > same? If not, does anyone know a a robust way of specifying a bond > > within a molecule as a string-based representation? > > > > Thanks for the help! > > -- Peter > > > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
Awesome, thanks for the tip! Connor, that also is a great idea, I didn't know about atom-mapped SMILES strings. That would definitely be a good method if the indexing algorithm changes across rdkit versions. Thanks! -- Peter On Tue, Oct 2, 2018 at 2:56 PM Nils Weskamp wrote: > Hi Peter, > > to the best of my knowledge: for a given SMILES string, you should > always end up with the same molecule object. > > On the other hand, generation of (canonical / unique) SMILES often > reorders atoms and bonds (to ensure that the SMILES is unique for a > given structure). A conversion Molecule -> SMILES -> Molecule could thus > lead to a different ordering of atoms and bonds and you will have to > canonicalize your structure before you generate your index. [Or make > sure that you use non-canonical SMILES.] > > Best, > Nils > > Am 02.10.2018 um 22:32 schrieb Peter St. John: > > If I store a molecule as a SMILES string, along with relevant > > information about different bonds, is it safe to annotate those bond > > entries by bond index? > > > > I.e., if I create a new rdkit Molecule with > > rdkit.Chem.MolFromSmiles(xxx), will the bond ordering always be the > > same? If not, does anyone know a a robust way of specifying a bond > > within a molecule as a string-based representation? > > > > Thanks for the help! > > -- Peter > > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Are atom and bond indexes deterministic?
Hi Peter, to the best of my knowledge: for a given SMILES string, you should always end up with the same molecule object. On the other hand, generation of (canonical / unique) SMILES often reorders atoms and bonds (to ensure that the SMILES is unique for a given structure). A conversion Molecule -> SMILES -> Molecule could thus lead to a different ordering of atoms and bonds and you will have to canonicalize your structure before you generate your index. [Or make sure that you use non-canonical SMILES.] Best, Nils Am 02.10.2018 um 22:32 schrieb Peter St. John: > If I store a molecule as a SMILES string, along with relevant > information about different bonds, is it safe to annotate those bond > entries by bond index? > > I.e., if I create a new rdkit Molecule with > rdkit.Chem.MolFromSmiles(xxx), will the bond ordering always be the > same? If not, does anyone know a a robust way of specifying a bond > within a molecule as a string-based representation? > > Thanks for the help! > -- Peter > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss