Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Dimitri Maziuk
On 06/15/2017 01:14 PM, Brian Kelley wrote:
> Sorry to hear about the flooding.

>> Unfortunately we got flooded day before yesterday and the servers doing
>> the crunching are currently down.

I should have mentioned that the server (URL is in the article), which
I'll hopefully get back up today, will output a MOL file with atoms
ordered as per the article.

The downside is it only works on 3D MOLs.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Brian Kelley
Sorry to hear about the flooding.

As an aside if you want to get they smiles atom output order, it is saved as a 
property on the molecule after a call to MolToSmiles,

To get to the property, use mol.GetPropsAsDict(True,True) and it will be there 
with the key named something like "_smilesAtomOutputOrder"

We should probably make a helper function for this.


Brian Kelley

> On Jun 15, 2017, at 6:27 PM, Dimitri Maziuk  wrote:
> 
>> On 06/15/2017 10:13 AM, Maciek Wójcikowski wrote:
>> Hi,
>> 
>> If you really want to rely on the order of atom you can renumber them
>> anyhow you like with Chem.RenumberAtoms()
>> http://rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RenumberAtoms
>> There is also a function which returns canonical order of atoms for
>> you: Chem.CanonicalRankAtoms() As I remember correctly the order may differ
>> from the canonical smiles, although that might have changed.
> 
> https://www.nature.com/articles/sdata201773
> 
> Unfortunately we got flooded day before yesterday and the servers doing
> the crunching are currently down.
> 
> -- 
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Dimitri Maziuk
On 06/15/2017 10:13 AM, Maciek Wójcikowski wrote:
> Hi,
> 
> If you really want to rely on the order of atom you can renumber them
> anyhow you like with Chem.RenumberAtoms()
> http://rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RenumberAtoms
> There is also a function which returns canonical order of atoms for
> you: Chem.CanonicalRankAtoms() As I remember correctly the order may differ
> from the canonical smiles, although that might have changed.

https://www.nature.com/articles/sdata201773

Unfortunately we got flooded day before yesterday and the servers doing
the crunching are currently down.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Maciek Wójcikowski
Hi,

If you really want to rely on the order of atom you can renumber them
anyhow you like with Chem.RenumberAtoms()
http://rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RenumberAtoms
There is also a function which returns canonical order of atoms for
you: Chem.CanonicalRankAtoms() As I remember correctly the order may differ
from the canonical smiles, although that might have changed.


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2017-06-15 9:03 GMT+02:00 Brian Kelley :

> Yes, atoms are always added in file order.  It would take a major change
> in rdkit to change/violate this.
>
> 
> Brian Kelley
>
> > On Jun 15, 2017, at 7:52 AM, Francois BERENGER <
> beren...@bioreg.kyushu-u.ac.jp> wrote:
> >
> > Hello,
> >
> > If I read a molecule from a .sdf file, will the atom indexes be
> conserved/preserved?
> >
> > 1st atom in the file will have index 0,
> > 2nd index 1, etc.
> >
> > And, will this always hold in the future?
> > Is this an invariant of rdkit?
> >
> > Thanks,
> > F.
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] a 2D to 3D (smi to sdf) conformer generator python script using rdkit

2017-06-15 Thread Francois BERENGER

On 06/15/2017 03:50 PM, Greg Landrum wrote:
Thanks for letting people know about this. If we can get a consensus 
form that people agree makes sense, this might be a nice addition to 
either the RDKit/Scripts directory or the cookbook.


A couple of smallish comments after a quick skim:
- I would really strongly encourage you to use the ETKDG parameters 
(http://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00654) when doing the 
embedding. This really helps a lot with the quality of the conformations 
and lets you skip the UFF step.
- The built-in RMSD pruning has improved since JP's article, it may be 
worth looking at that.


It would be nice if we have a way faster protocol than what I implemented.

This protocol (the one from the paper) is super slow due
to the RMSD pruning step (not due to UFF).
The more conformers/molecule you need, the slower.

But it works, at least.

The problem if you change the protocol to something more modern
is that you have to redo all the statistical validation they
did to confirm it works well.
Which requires quite some time and motivation.

- If you want to make the embedding step itself robust, it wouldn't be a 
bad idea to try switching to random coordinate generation if the initial 
embedding fails.


Thanks for the comment. I might update this part if I see it fail.

Regards,
F.


Best,
-greg



On Wed, Jun 14, 2017 at 9:27 AM, Francois BERENGER 
> 
wrote:


Hello,

I gave a try at reproducing the protocol described in:

@article{DBLP:journals/jcisd/EbejerMD12,
   author= {Jean{-}Paul Ebejer and Garrett M. Morris and
Charlotte M. Deane},
   title = {Freely Available Conformer Generation Methods:
How Good Are They?},
   journal   = {Journal of Chemical Information and Modeling},
   volume= {52},
   number= {5},
   pages = {1146--1158},
   year  = {2012},
   url   = {https://doi.org/10.1021/ci2004658
},
   doi   = {10.1021/ci2004658},
}

The resulting script is there:

https://github.com/UnixJunkie/smi2sdf3d


I hope I could reproduce their protocol exactly.
Sorry, my python is so rusty these days.

Comments and contributions are welcome.

Even auditing the code for correctness is welcome since it is
doing some scientific computation.

It is a little bit too slow to my taste.

You can use it like this to get a max of 10 conformers
per molecule in your input.smi file:

./smi2sdf.py 10 input.smi output.sdf

Best regards,
Francois.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AllChem.GetConformerRMSD: this is not RMSD between two conformers but an upper bound of it

2017-06-15 Thread Brian Kelley
Thanks for the documentation fix, I had read the same as Francois.




Brian Kelley

> On Jun 15, 2017, at 8:02 AM, Greg Landrum  wrote:
> 
> 
> 
>> On Thu, Jun 15, 2017 at 6:30 AM, Francois BERENGER 
>>  wrote:
>> 
>> I am afraid that in AllChem.GetConfomerRMSD: one doesn't get the RMSD
>> between the two conformers but an upper bound of it.
> 
> The documentation to this function is misleading:
> 
> In [21]: AllChem.GetConformerRMS?
> Signature: AllChem.GetConformerRMS(mol, confId1, confId2, atomIds=None, 
> prealigned=False)
> Docstring:
> Returns the RMS between two conformations.
> By default, the conformers will be aligned to the first conformer
> of the molecule (i.e. the reference) before RMS calculation and,
> as a side-effect, will be left in the aligned state.
> 
> Arguments:
>   - mol:the molecule
>   - confId1:the id of the first conformer
>   - confId2:the id of the second conformer
>   - atomIds:(optional) list of atom ids to use a points for
> alingment - defaults to all atoms
>   - prealigned: (optional) by default the conformers are assumed
> be unaligned and will therefore be aligned to the
> first conformer
> 
> 
> The alignment is done to the first conformer (i.e confId1).[1]
> Here's a demonstration of that:
> In [31]: AllChem.GetConformerRMS(m,0,1,prealigned=True)
> Out[31]: 9.1593890932638349
> 
> In [32]: AllChem.GetConformerRMS(m,0,2,prealigned=True)
> Out[32]: 3.8219771356556071
> 
> In [33]: AllChem.GetConformerRMS(m,1,2,prealigned=True)
> Out[33]: 8.597878324406647
> 
> In [34]: AllChem.GetConformerRMS(m,1,2)
> Out[34]: 1.1067869816465845   # conformer 2 is now aligned to conformer 1
> 
> In [35]: AllChem.GetConformerRMS(m,0,1,prealigned=True)
> Out[35]: 9.1593890932638349   # the RMS between confs 0 and 1 hasn't changed
> 
> In [36]: AllChem.GetConformerRMS(m,0,2,prealigned=True)
> Out[36]: 9.4691776880629508   # the RMS between confs 0 and 2 has changed
> 
> I will clean that documentation up.
> 
> 
> -greg
> [1] since that's a "conformer of the molecule" the documentation isn't 
> actually wrong, but it's misleading enough to be effectively wrong.
>  
>> 
>> I understand from the doc that if they are aligned, they are aligned
>> to the first conformer of the molecule.
>> 
>> To get the real RMSD between two conformers, they must
>> be superimposed together, not to a third conformer.
>> 
>> Please tell me if I'm wrong.
>> 
>> Regards,
>> F.
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AllChem.GetConformerRMSD: this is not RMSD between two conformers but an upper bound of it

2017-06-15 Thread Brian Kelley
The function you want is GetBestRMS, note that you can set the conformer idx 
for the probe and ref.

http://www.rdkit.org/Python_Docs/rdkit.Chem.AllChem-module.html#GetBestRMS


Brian Kelley

> On Jun 15, 2017, at 5:30 AM, Francois BERENGER 
>  wrote:
> 
> Hello,
> 
> I am afraid that in AllChem.GetConfomerRMSD: one doesn't get the RMSD
> between the two conformers but an upper bound of it.
> 
> I understand from the doc that if they are aligned, they are aligned
> to the first conformer of the molecule.
> 
> To get the real RMSD between two conformers, they must
> be superimposed together, not to a third conformer.
> 
> Please tell me if I'm wrong.
> 
> Regards,
> F.
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Brian Kelley
Yes, atoms are always added in file order.  It would take a major change in 
rdkit to change/violate this.


Brian Kelley

> On Jun 15, 2017, at 7:52 AM, Francois BERENGER 
>  wrote:
> 
> Hello,
> 
> If I read a molecule from a .sdf file, will the atom indexes be 
> conserved/preserved?
> 
> 1st atom in the file will have index 0,
> 2nd index 1, etc.
> 
> And, will this always hold in the future?
> Is this an invariant of rdkit?
> 
> Thanks,
> F.
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AllChem.GetConformerRMSD: this is not RMSD between two conformers but an upper bound of it

2017-06-15 Thread Greg Landrum
On Thu, Jun 15, 2017 at 6:30 AM, Francois BERENGER <
beren...@bioreg.kyushu-u.ac.jp> wrote:

>
> I am afraid that in AllChem.GetConfomerRMSD: one doesn't get the RMSD
> between the two conformers but an upper bound of it.
>

The documentation to this function is misleading:

In [21]: AllChem.GetConformerRMS?
Signature: AllChem.GetConformerRMS(mol, confId1, confId2, atomIds=None,
prealigned=False)
Docstring:
Returns the RMS between two conformations.
By default, the conformers will be aligned to the first conformer
of the molecule (i.e. the reference) before RMS calculation and,
as a side-effect, will be left in the aligned state.

Arguments:
  - mol:the molecule
  - confId1:the id of the first conformer
  - confId2:the id of the second conformer
  - atomIds:(optional) list of atom ids to use a points for
alingment - defaults to all atoms
  - prealigned: (optional) by default the conformers are assumed
be unaligned and will therefore be aligned to the
first conformer



The alignment is done to the first conformer (i.e confId1).[1]
Here's a demonstration of that:

In [31]: AllChem.GetConformerRMS(m,0,1,prealigned=True)
Out[31]: 9.1593890932638349

In [32]: AllChem.GetConformerRMS(m,0,2,prealigned=True)
Out[32]: 3.8219771356556071

In [33]: AllChem.GetConformerRMS(m,1,2,prealigned=True)
Out[33]: 8.597878324406647

In [34]: AllChem.GetConformerRMS(m,1,2)
Out[34]: 1.1067869816465845   # conformer 2 is now aligned to conformer 1

In [35]: AllChem.GetConformerRMS(m,0,1,prealigned=True)
Out[35]: 9.1593890932638349   # the RMS between confs 0 and 1 hasn't changed

In [36]: AllChem.GetConformerRMS(m,0,2,prealigned=True)
Out[36]: 9.4691776880629508   # the RMS between confs 0 and 2 has changed


I will clean that documentation up.


-greg
[1] since that's a "conformer of the molecule" the documentation isn't
actually wrong, but it's misleading enough to be effectively wrong.


>
> I understand from the doc that if they are aligned, they are aligned
> to the first conformer of the molecule.
>
> To get the real RMSD between two conformers, they must
> be superimposed together, not to a third conformer.
>
> Please tell me if I'm wrong.
>
> Regards,
> F.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Francois BERENGER

Hello,

If I read a molecule from a .sdf file, will the atom indexes be 
conserved/preserved?


1st atom in the file will have index 0,
2nd index 1, etc.

And, will this always hold in the future?
Is this an invariant of rdkit?

Thanks,
F.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] a 2D to 3D (smi to sdf) conformer generator python script using rdkit

2017-06-15 Thread Greg Landrum
Thanks for letting people know about this. If we can get a consensus form
that people agree makes sense, this might be a nice addition to either the
RDKit/Scripts directory or the cookbook.

A couple of smallish comments after a quick skim:
- I would really strongly encourage you to use the ETKDG parameters (
http://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00654) when doing the
embedding. This really helps a lot with the quality of the conformations
and lets you skip the UFF step.
- The built-in RMSD pruning has improved since JP's article, it may be
worth looking at that.
- If you want to make the embedding step itself robust, it wouldn't be a
bad idea to try switching to random coordinate generation if the initial
embedding fails.

Best,
-greg



On Wed, Jun 14, 2017 at 9:27 AM, Francois BERENGER <
beren...@bioreg.kyushu-u.ac.jp> wrote:

> Hello,
>
> I gave a try at reproducing the protocol described in:
>
> @article{DBLP:journals/jcisd/EbejerMD12,
>   author= {Jean{-}Paul Ebejer and Garrett M. Morris and
>Charlotte M. Deane},
>   title = {Freely Available Conformer Generation Methods:
>How Good Are They?},
>   journal   = {Journal of Chemical Information and Modeling},
>   volume= {52},
>   number= {5},
>   pages = {1146--1158},
>   year  = {2012},
>   url   = {https://doi.org/10.1021/ci2004658},
>   doi   = {10.1021/ci2004658},
> }
>
> The resulting script is there:
>
> https://github.com/UnixJunkie/smi2sdf3d
>
> I hope I could reproduce their protocol exactly.
> Sorry, my python is so rusty these days.
>
> Comments and contributions are welcome.
>
> Even auditing the code for correctness is welcome since it is
> doing some scientific computation.
>
> It is a little bit too slow to my taste.
>
> You can use it like this to get a max of 10 conformers
> per molecule in your input.smi file:
>
> ./smi2sdf.py 10 input.smi output.sdf
>
> Best regards,
> Francois.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to append conformer number to molecule name in SDF output file?

2017-06-15 Thread Greg Landrum
Sure you can do this by setting the molecule's "_Name" property before each
call to SDWriter.write().
Here's a short example:

In [15]: sio = StringIO()

In [16]: w = Chem.SDWriter(sio)

In [17]: basen = m.GetProp('_Name')

In [18]: for conf in m.GetConformers():
...: cid = conf.GetId()
...: nm = "{}-{}".format(basen,cid)
...: m.SetProp('_Name',nm)
...: w.write(m,confId=cid)
...:

In [19]: w.flush()

In [20]: print(sio.getvalue()[:20])
mol1-0
 RDKit

This just adds a conformation ID to the end of whatever the molecule's name
starts out as.

Hope this helps,
-greg



On Wed, Jun 14, 2017 at 5:00 AM, Francois BERENGER <
beren...@bioreg.kyushu-u.ac.jp> wrote:

> Hello,
>
> I am generating conformers.
>
> When I write them out, I'd like that they are named like this:
>
> molName_001
> molName_002
> ...
>
> So that, down the line, I know with which conformer of which molecule
> I am working with.
>
> So: "parent" molecule name followed by one '_' then the conformer Id
> in a fixed format.
>
> Is it possible to do so with a Chem.SDWriter?
>
> Thanks a lot,
> Francois.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss