Re: [Rdkit-discuss] RDKit cartridge similarity search speeds(?)

2014-05-08 Thread Greg Landrum
Hi James,

[fair warning before I start: we quickly hit the limits of my postgresql
expertise here]


On Thu, May 8, 2014 at 2:35 PM, James Davidson wrote:

>  Dear All,
>
>
>
> I have recently been spending a bit more time with the RDKit cartridge,
> and have what is probably a very naïve question…
>
> Having built some RDKit fingerprints for ChEMBL_18, I see the following
> behaviour (for clarification – ‘ecfp4_bv’ is the column in my rdk.fps table
> that has been generated using morganbv_fp(mol, 2)):
>
>
>
>
>
> chembl_18=# \timing on
>
> Timing is on.
>
>
>
> chembl_18=# set rdkit.tanimoto_threshold=0.5;
>
> SET
>
> Time: 0.167 ms
>
>
>
> chembl_18=# select chembl_id from rdk.fps where ecfp4_bv %
> morganbv_fp('c1nnccc1'::mol,2);
>
>   chembl_id
>
> -
>
> CHEMBL15719
>
> (1 row)
>
>
>
> Time: 2033.348 ms
>
>
>
> chembl_18=# select chembl_id from rdk.fps where tanimoto_sml(ecfp4_bv,
> morganbv_fp('c1nnccc1'::mol, 2)) > 0.5;
>
>   chembl_id
>
> -
>
> CHEMBL15719
>
> (1 row)
>
>
>
> Time: 6843.605 ms
>
>
>
>
>
> I can see that the query plans are different in the two cases, but I don’t
> fully understand why – see below:
>
>
>
> *QUERY 1 (with explain analyze)*
>
> chembl_18=# explain analyze select chembl_id from rdk.fps where ecfp4_bv %
> morganbv_fp('c1nnccc1'::mol,2);
>
>
> QUERY PLAN
>
>
> 
>
> Bitmap Heap Scan on fps  (cost=106.91..5298.31 rows=1352 width=13) (actual
> time=1774.986..1774.987 rows=1 loops=1)
>
>Recheck Cond: (ecfp4_bv %
> '\x0100084200048204'::bfp)
>
>->  Bitmap Index Scan on fps_ecfp4bv_idx  (cost=0.00..106.57 rows=1352
> width=0) (actual time=1774.969..1774.969 rows=1 loops=1)
>
>  Index Cond: (ecfp4_bv %
> '\x0100084200048204'::bfp)
>
> Total runtime: 1775.035 ms
>
> (5 rows)
>
>
>
> Time: 1776.133 ms
>
>
>
>
>
> *QUERY 2 (with explain analyze)*
>
> chembl_18=# explain analyze select chembl_id from rdk.fps where
> tanimoto_sml(ecfp4_bv, morganbv_fp('c1nnccc1'::mol, 2)) > 0.5;
>
>
> QUERY PLAN
>
>
> ---
>
> Seq Scan on fps  (cost=0.00..388808.17 rows=450793 width=13) (actual
> time=1278.115..6953.977 rows=1 loops=1)
>
>Filter: (tanimoto_sml(ecfp4_bv,
> '\x0100084200048204'::bfp)
> > 0.5::double precision)
>
>Rows Removed by Filter: 1352377
>
> Total runtime: 6954.010 ms
>
> (4 rows)
>
>
>
> Time: 6955.103 ms
>

What these are telling you is that the second query is not using the index:
it's a sequential scan, so it has to test all rows of the database. This
happens because the index is defined for the operator %, but not for the
function tanimoto_sml(). There may be an approach to get the index set up
using that function, but there we reach the limits of my expertise.


>  It seems conceptually ‘easier’ to add the similarity value as part of
> the query, rather than setting it as a variable ahead of the query; but
> clearly I should be doing it the latter way for performance reasons.  So
> even if I don’t fully understand why at the moment, am I correct in
> thinking that queries of this sort should always be run with the similarity
> operators (%, #)?  And if so, is the rdkit.tanimoto_threshold variable set
> at the level of the session, the user, or the database?
>
>
It's set at the session level.

When doing similarity searches, I find it generally helpful to also include
the <%> operator in an "order by" clause so that the results come back in
sorted order.
So instead of this;
chembl_17=# select molregno from rdk.fps where mfp2 %
morganbv_fp('Cc1ccc2nc(N(C)CC(=O)O)sc2c1');
 molregno
--
   412312
   412302
   412310
   441378
   470082
   773946
   775269
   911501
  1015485
  1034321
  1040255
  1040496
  1042958
  1043871
  1044892
  1045663
  1047691
  1049393
(18 rows)

Time: 1042.310 ms

I do this:
chembl_17=# select molregno from rdk.fps where mfp2 %
morganbv_fp('Cc1ccc2nc(N(C)CC(=O)O)sc2c1')
 order by
morganbv_fp('Cc1ccc2nc(N(C)CC(=O)O)sc2c1') <%> mfp2;
 molregno
--
   412312
   470082
  1040255
   773946
  1044892
  1049393
  1040496
   441378
  1047691
  1042958
   412302
  1043871
   412310
  1045663
   911501
   775269
  1015485
  1034321
(18 rows)

Time: 1032.266 ms

Notice that this doesn't make things any slower.

It's nice to see the actual similarity values:

Re: [Rdkit-discuss] Chem.PandasTools

2014-05-08 Thread Paul . Czodrowski
Dear Grégori, 

when storing the image into a new data frame:
"
MMP_reaction = Chem.rdChemReactions.ReactionFromSmarts("[*:1][H]>>[*:1]C")
newnew_df = pd.DataFrame(columns=['fig'],index=[1] )
newnew_df['fig'].ix[1] = Draw.ReactionToImage(MMP_reaction)
"

apparently, the image can be stored in a data frame, but in the ipython 
notebook it is displayed as " Hi Paul,
> 
> You first have to read the MMP into a reaction object 
> (Chem.ReactionFromSmarts).
> 
> Greg
> 
> On Friday, May 9, 2014,  wrote:
> Dear Gregori & Samo,
> 
> thanks for your hints.
> 
> I just tried running
> 
> Draw.ReactionToImage("[*:1][H]>>[*:1]C")
> 
> =>
> 
> AttributeError: 'str' object has no attribute 'GetNumReactantTemplates'
> 
> 
> 
> BTW, how would I finally add a picture to a Pandas data frame?
> 
> 
> Cheers,
> Paul


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chem.PandasTools

2014-05-08 Thread Gerebtzoff, Gregori
Hi Paul,

You first have to read the MMP into a reaction object
(Chem.ReactionFromSmarts).

Greg

On Friday, May 9, 2014,  wrote:

> Dear Gregori & Samo,
>
> thanks for your hints.
>
> I just tried running
>
> Draw.ReactionToImage("[*:1][H]>>[*:1]C")
>
> =>
>
> AttributeError: 'str' object has no attribute 'GetNumReactantTemplates'
>
>
>
> BTW, how would I finally add a picture to a Pandas data frame?
>
>
> Cheers,
> Paul
>
>
> >
> > Hi Paul,
> >
> > The Draw modules also contains a "ReactionToImage" function;
> > Your MMP can be read as a reaction.
> > Hope this helps further!
> >
> > Grégori
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chem.PandasTools

2014-05-08 Thread Paul . Czodrowski
Dear Gregori & Samo,

thanks for your hints.

I just tried running

Draw.ReactionToImage("[*:1][H]>>[*:1]C")

=>

AttributeError: 'str' object has no attribute 'GetNumReactantTemplates'



BTW, how would I finally add a picture to a Pandas data frame?


Cheers,
Paul


> 
> Hi Paul,
> 
> The Draw modules also contains a "ReactionToImage" function;
> Your MMP can be read as a reaction.
> Hope this helps further!
> 
> Grégori


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chem.PandasTools

2014-05-08 Thread Gerebtzoff, Gregori
Hi Paul,

The Draw modules also contains a "ReactionToImage" function;
Your MMP can be read as a reaction.
Hope this helps further!

Grégori

Date: Thu, 8 May 2014 16:31:32 +0200
> From: paul.czodrow...@merckgroup.com
> Subject: [Rdkit-discuss] Chem.PandasTools
> To: rdkit-discuss@lists.sourceforge.net
> Message-ID:
> <
> ofc0c168e1.8dc7f4cf-onc1257cd2.004f2cec-c1257cd2.004fc...@merck.de>
> Content-Type: text/plain; charset="US-ASCII"
>
> Dear RDKitters,
>
> I started to play around with the great Chem.PandasTool contribution
> provided by Nicholas and Samo.
>
> Given such a data frame:
> "
> Transformation  npairs
> 1   [*:1][H]>>[*:1]C5
> "
>
> how do I depict the molecular transformation in the dataframe?
>
>
> I guess that I somehow have to integrate this function
> "
> def showLine_MMP(in_string):
> f = in_string.split("\t")
> LHS = Chem.MolFromSmiles(f[0].split(">>")[0])
> RHS = Chem.MolFromSmiles(f[0].split(">>")[1])
> mols.append(LHS)
> mols.append(RHS)
> return Draw.MolsToGridImage(mols,molsPerRow=2)
> "
>
> but I'm not sure how to accomplish this.
>
>
> Cheers & Thanks,
> Paul
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>
>
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chem.PandasTools

2014-05-08 Thread Samo Turk
Hi,

I'm not sure if it will work but you can try:
df['new'] = df['Transformation'].map(showLine_MMP)


Regards,
Samo


On Thu, May 8, 2014 at 4:31 PM,  wrote:

> Dear RDKitters,
>
> I started to play around with the great Chem.PandasTool contribution
> provided by Nicholas and Samo.
>
> Given such a data frame:
> "
> Transformation  npairs
> 1   [*:1][H]>>[*:1]C5
> "
>
> how do I depict the molecular transformation in the dataframe?
>
>
> I guess that I somehow have to integrate this function
> "
> def showLine_MMP(in_string):
> f = in_string.split("\t")
> LHS = Chem.MolFromSmiles(f[0].split(">>")[0])
> RHS = Chem.MolFromSmiles(f[0].split(">>")[1])
> mols.append(LHS)
> mols.append(RHS)
> return Draw.MolsToGridImage(mols,molsPerRow=2)
> "
>
> but I'm not sure how to accomplish this.
>
>
> Cheers & Thanks,
> Paul
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>
>
> --
> Is your legacy SCM system holding you back? Join Perforce May 7 to find
> out:
> • 3 signs your SCM is hindering your productivity
> • Requirements for releasing software faster
> • Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: Tautomeric InChIs

2014-05-08 Thread Greg Landrum
As Markus points out: one of the features of standard InChI is that it
normalizes tautomers.
If you would like to generate a non-standard InChI that (most of the time)
keeps the tautomer you provide, you can provide the "/FixedH" option:
In [7]: m =
Chem.MolFromSmiles('[H]N1C(=O)C(=C2C(=O)c3c(Cl)sc(F)c3N2[H])c2sc(F)c(Cl)c21')

In [8]: Chem.MolToInchi(m,options='/FixedH')
[16:34:46] WARNING: Omitted undefined stereo
Out[8]:
'InChI=1/C12H2Cl2F2N2O2S2/c13-3-6-8(21-10(3)15)2(12(20)18-6)4-7(19)1-5(17-4)11(16)22-9(1)14/h17H,(H,18,20)/f/h18H'

Note that these cannot be compared with standard InChIs.

-greg




On Thu, May 8, 2014 at 3:30 PM, Markus Sitzmann
wrote:

> -- Forwarded message --
> From: Markus Sitzmann 
> Date: Thu, May 8, 2014 at 3:27 PM
> Subject: Re: [Rdkit-discuss] Tautomeric InChIs
> To: Edward Pyzer-Knapp 
>
>
> Hi Edward,
>
> since your InChI is a Standard InChI ("1S/"): tautomeric forms are
> purposely  *not* preserved by Standard InChI - that's why we created
> Standard InChI (with non-standard InChI's it is another story, those
> you can make tautomer-sensitive or insensitive).And actually many
> people complain that Standard InChI falls short in some cases
> regarding tautomer normalization :-).
>
> Best,
> Markus
>
> On Thu, May 8, 2014 at 3:16 PM, Edward Pyzer-Knapp
>  wrote:
> > Hi all,
> >
> > I have been playing around with RDKIT for a while now - great work guys!
> >
> > I have recently hit an issue when using InChIs:
> >
> > When generating both inchi and smiles from a rdkit Mol, I get two
> different
> > structures, even if I use the smiles as an input for the inchi
> generation.
> >
> > An example:
> >
> > smiles = "[H]N1C(=O)C(=C2C(=O)c3c(Cl)sc(F)c3N2[H])c2sc(F)c(Cl)c21" (I
> should
> > add this smiles was generated by RDKIT, from a Mol file)
> >
> > mol = MolFromSmiles(smiles)
> > inchi = MolToInchi(mol)
> >
> > print inchi
> >
> InChI=1S/C12H2Cl2F2N2O2S2/c13-3-6-8(21-10(3)15)2(12(20)18-6)4-7(19)1-5(17-4)11(16)22-9(1)14/h17H,(H,18,20)
> >
> > when comparing the smiles and the inchi, the C=O has changed to an OH
> and a
> > C-N-H  has changed to a C=N.  I realise that these are tautomers of each
> > other, but surely the tautomeric form should be preserved when
> interchanging
> > smiles to inchi? Since at the moment, going Smiles->Inchi->Smiles does
> NOT
> > result in the original smiles...
> >
> > There is a layer in the INCHI standard which would allow description of
> > this, is there a way to turn that on?
> >
> > Many Thanks,
> >
> > Ed Pyzer-Knapp
> >
> >
> --
> > Is your legacy SCM system holding you back? Join Perforce May 7 to find
> out:
> > • 3 signs your SCM is hindering your productivity
> > • Requirements for releasing software faster
> > • Expert tips and advice for migrating your SCM now
> > http://p.sf.net/sfu/perforce
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
>
> --
> Is your legacy SCM system holding you back? Join Perforce May 7 to find
> out:
> • 3 signs your SCM is hindering your productivity
> • Requirements for releasing software faster
> • Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Chem.PandasTools

2014-05-08 Thread Paul . Czodrowski
Dear RDKitters,

I started to play around with the great Chem.PandasTool contribution 
provided by Nicholas and Samo.

Given such a data frame:
"
Transformation  npairs
1   [*:1][H]>>[*:1]C5
"

how do I depict the molecular transformation in the dataframe?


I guess that I somehow have to integrate this function
"
def showLine_MMP(in_string):
f = in_string.split("\t")
LHS = Chem.MolFromSmiles(f[0].split(">>")[0])
RHS = Chem.MolFromSmiles(f[0].split(">>")[1])
mols.append(LHS)
mols.append(RHS)
return Draw.MolsToGridImage(mols,molsPerRow=2)
"

but I'm not sure how to accomplish this.


Cheers & Thanks,
Paul


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MCS-based similarity in carbohydrates

2014-05-08 Thread Greg Landrum
Hi Sushil,

There is some information on using isotope information in MCS here:
http://rdkit.blogspot.ch/2014/01/an-interesting-mcs-use-case.html

-greg



On Thu, May 8, 2014 at 4:14 PM, Sushil Mishra wrote:

> Hello again,
>
> Using isotopes seems to produce something that I am unable to understand.
> Sorry, I am new to python so may be I am not able to do it correctly. I
> have two mol2 structures stored in mol2str_1 and mol2str_2. These two have
> exactly same atom name of the MCS only. Rest of the names doesn't match
> with each other at all. What I am doing is:
>
> -
> mol1 = rdkit.Chem.MolFromMol2Block(mol2str_1, sanitize = False,
>removeHs = False)
>
> mol2 = rdkit.Chem.MolFromMol2Block(mol2str_2, sanitize = False,
>removeHs = False)
> mcs = rdkit.Chem.MCS.FindMCS( (mol1, mol2), maximize = 'atoms',
>   atomCompare = 'isotopes', bondCompare =
> 'bondtypes',
>   ringMatchesRingOnly = True,
>  completeRingsOnly = True, timeout =
> maxtime)
> print mcs
>
> --
> Its is printing :
>
> '[0*](-!@[0*])(-!@[0*])(-!@[0*])-!@[0*]-!@[0*]-@1(-!@[0*])-@[0*]-@[0*](-!@[0*])(-@[0*](-@[0*](-@[0*]-@1(-!@[0*]-!@[0*])-!@[0*])(-!@[0*]-!@[0*])-!@[0*])(-!@[0*]-!@[0*])-!@[0*])-!@[0*](-!@[0*])(-!@[0*])-!@[0*]'
>
> Can someone suggest me if I am doing a mistake here ?
>
> /Sushil
>
>
>
> On Thu, May 8, 2014 at 1:57 PM, Sushil Mishra 
> wrote:
> >
> > Hi Andrew,
> >
> > Thanks a lot for the suggestions. MCS search return I am getting is
> similar what you  have said:
> >
> '[#6](-!@[#6]-!@[#6]-@1(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6](-!@[#6])(-!@[#6])-!@[#6])(-!@[#6])-@[#6]-@1)(-!@[#6])(-!@[#6])-!@[#6]'
>
> >
> >
> > Your idea of using atomCompare = 'isotopes' seem to be good and I will
> give a try..
> >
> > I am using this approach to prepare input structures from Free Energy
> Perturbation calculations. Thus , if I have structure A and B I would like
> to generate a starting structures for calculations which should contain MCS
> atoms plus rest of the atoms of A ( only will be changed to dummy during
> calculations) + rest of the atoms of B as dummy atoms (will be dummy in
> input structure and changed to their respective atoms during calculations).
> There can be another structure for perturbing B into A.
> >
> > I will see if "isotopes" can solve the problem.
> >
> > Thanks
> > Sushil
> >
> >
> >
> > On Thu, May 8, 2014 at 1:04 PM, Andrew Dalke 
> wrote:
> >>
> >> Hi Sushil,
> >>
> >> On May 8, 2014, at 12:26 PM, Sushil Mishra wrote:
> >> > MCS algorithm seems to me unable to handle chiral carbons and it can
> not differentiate chiral changes in ligands.
> >>
> >> That's correct. The MCS algorithm in RDKit doesn't consider chirality.
> While in principle I think it would be possible to extend the current
> algorithm to support it, it would require some extensive changes.
> >>
> >> > Moreover, it also fails to differentiate between position of atoms in
> symmetrical positions.For example I have 6 atoms ring (C1, C2, C3, C4, C5,
> O5) with one -CH3 at C1 and another structure with -CH3 at C5. MCS can not
> differentiate such structures.
> >>
> >> The MCS search returns a SMARTS pattern, in this case something like:
> >>
> >>   [#6]1~[#6]~[#6]~[#6]~[#6]~([~6])~[#8]1
> >>
> >> I don't think there's any way for a SMARTS, or a least a non-recursive
> SMARTS, to handle those other than symmetrically.
> >>
> >> It may be possible, through isotope labeling, for you to define your
> own atom classes, so that the C1 atom in one carbohydrate can only ever
> match the C1 atom in another.
> >>
> >> What would you like for it to return instead, in order to get the
> information you need?
> >>
> >> Cheers,
> >>
> >> Andrew
> >> da...@dalkescientific.com
> >>
> >>
> >>
> --
> >> Is your legacy SCM system holding you back? Join Perforce May 7 to find
> out:
> >> • 3 signs your SCM is hindering your productivity
> >> • Requirements for releasing software faster
> >> • Expert tips and advice for migrating your SCM now
> >> http://p.sf.net/sfu/perforce
> >> ___
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
> >
> >
> > --
> > ==
> >   Sushil Kumar Mishra, PhD
> > --
> >   CEITEC- Central European Institute of Technology. M

Re: [Rdkit-discuss] MCS-based similarity in carbohydrates

2014-05-08 Thread Sushil Mishra
Hello again,

Using isotopes seems to produce something that I am unable to understand.
Sorry, I am new to python so may be I am not able to do it correctly. I
have two mol2 structures stored in mol2str_1 and mol2str_2. These two have
exactly same atom name of the MCS only. Rest of the names doesn't match
with each other at all. What I am doing is:
-
mol1 = rdkit.Chem.MolFromMol2Block(mol2str_1, sanitize = False,
   removeHs = False)

mol2 = rdkit.Chem.MolFromMol2Block(mol2str_2, sanitize = False,
   removeHs = False)
mcs = rdkit.Chem.MCS.FindMCS( (mol1, mol2), maximize = 'atoms',
  atomCompare = 'isotopes', bondCompare =
'bondtypes',
  ringMatchesRingOnly = True,
 completeRingsOnly = True, timeout =
maxtime)
print mcs
--
Its is printing :
'[0*](-!@[0*])(-!@[0*])(-!@[0*])-!@[0*]-!@[0*]-@1(-!@[0*])-@[0*]-@[0*](-!@[0*])(-@[0*](-@[0*](-@[0*]-@1(-!@[0*]-!@[0*])-!@[0*])(-!@[0*]-!@[0*])-!@[0*])(-!@[0*]-!@[0*])-!@[0*])-!@[0*](-!@[0*])(-!@[0*])-!@[0*]'

Can someone suggest me if I am doing a mistake here ?

/Sushil



On Thu, May 8, 2014 at 1:57 PM, Sushil Mishra 
wrote:
>
> Hi Andrew,
>
> Thanks a lot for the suggestions. MCS search return I am getting is
similar what you  have said:
>
'[#6](-!@[#6]-!@[#6]-@1(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6](-!@[#6])(-!@[#6])-!@[#6])(-!@[#6])-@[#6]-@1)(-!@[#6])(-!@[#6])-!@[#6]'

>
>
> Your idea of using atomCompare = 'isotopes' seem to be good and I will
give a try..
>
> I am using this approach to prepare input structures from Free Energy
Perturbation calculations. Thus , if I have structure A and B I would like
to generate a starting structures for calculations which should contain MCS
atoms plus rest of the atoms of A ( only will be changed to dummy during
calculations) + rest of the atoms of B as dummy atoms (will be dummy in
input structure and changed to their respective atoms during calculations).
There can be another structure for perturbing B into A.
>
> I will see if "isotopes" can solve the problem.
>
> Thanks
> Sushil
>
>
>
> On Thu, May 8, 2014 at 1:04 PM, Andrew Dalke 
wrote:
>>
>> Hi Sushil,
>>
>> On May 8, 2014, at 12:26 PM, Sushil Mishra wrote:
>> > MCS algorithm seems to me unable to handle chiral carbons and it can
not differentiate chiral changes in ligands.
>>
>> That's correct. The MCS algorithm in RDKit doesn't consider chirality.
While in principle I think it would be possible to extend the current
algorithm to support it, it would require some extensive changes.
>>
>> > Moreover, it also fails to differentiate between position of atoms in
symmetrical positions.For example I have 6 atoms ring (C1, C2, C3, C4, C5,
O5) with one -CH3 at C1 and another structure with -CH3 at C5. MCS can not
differentiate such structures.
>>
>> The MCS search returns a SMARTS pattern, in this case something like:
>>
>>   [#6]1~[#6]~[#6]~[#6]~[#6]~([~6])~[#8]1
>>
>> I don't think there's any way for a SMARTS, or a least a non-recursive
SMARTS, to handle those other than symmetrically.
>>
>> It may be possible, through isotope labeling, for you to define your own
atom classes, so that the C1 atom in one carbohydrate can only ever match
the C1 atom in another.
>>
>> What would you like for it to return instead, in order to get the
information you need?
>>
>> Cheers,
>>
>> Andrew
>> da...@dalkescientific.com
>>
>>
>>
--
>> Is your legacy SCM system holding you back? Join Perforce May 7 to find
out:
>> • 3 signs your SCM is hindering your productivity
>> • Requirements for releasing software faster
>> • Expert tips and advice for migrating your SCM now
>> http://p.sf.net/sfu/perforce
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
>
> --
> ==
>   Sushil Kumar Mishra, PhD
> --
>   CEITEC- Central European Institute of Technology. MU
>   ILBIT, Building A4 - 2.12
>   Kamenice 5, Brno 625 00
>   Czech Republic
> --
>   Email : sus...@chemi.muni.cz
>   Phone : +420-549 496 307
> ==




-- 
==
  Sushil Kumar Mishra, PhD
--
  CEITEC- Central European Institute of Technology. MU
  ILBIT

[Rdkit-discuss] Fwd: Tautomeric InChIs

2014-05-08 Thread Markus Sitzmann
-- Forwarded message --
From: Markus Sitzmann 
Date: Thu, May 8, 2014 at 3:27 PM
Subject: Re: [Rdkit-discuss] Tautomeric InChIs
To: Edward Pyzer-Knapp 


Hi Edward,

since your InChI is a Standard InChI ("1S/"): tautomeric forms are
purposely  *not* preserved by Standard InChI - that's why we created
Standard InChI (with non-standard InChI's it is another story, those
you can make tautomer-sensitive or insensitive).And actually many
people complain that Standard InChI falls short in some cases
regarding tautomer normalization :-).

Best,
Markus

On Thu, May 8, 2014 at 3:16 PM, Edward Pyzer-Knapp
 wrote:
> Hi all,
>
> I have been playing around with RDKIT for a while now - great work guys!
>
> I have recently hit an issue when using InChIs:
>
> When generating both inchi and smiles from a rdkit Mol, I get two different
> structures, even if I use the smiles as an input for the inchi generation.
>
> An example:
>
> smiles = "[H]N1C(=O)C(=C2C(=O)c3c(Cl)sc(F)c3N2[H])c2sc(F)c(Cl)c21" (I should
> add this smiles was generated by RDKIT, from a Mol file)
>
> mol = MolFromSmiles(smiles)
> inchi = MolToInchi(mol)
>
> print inchi
> InChI=1S/C12H2Cl2F2N2O2S2/c13-3-6-8(21-10(3)15)2(12(20)18-6)4-7(19)1-5(17-4)11(16)22-9(1)14/h17H,(H,18,20)
>
> when comparing the smiles and the inchi, the C=O has changed to an OH and a
> C-N-H  has changed to a C=N.  I realise that these are tautomers of each
> other, but surely the tautomeric form should be preserved when interchanging
> smiles to inchi? Since at the moment, going Smiles->Inchi->Smiles does NOT
> result in the original smiles...
>
> There is a layer in the INCHI standard which would allow description of
> this, is there a way to turn that on?
>
> Many Thanks,
>
> Ed Pyzer-Knapp
>
> --
> Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
> • 3 signs your SCM is hindering your productivity
> • Requirements for releasing software faster
> • Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fwd: RDKit cartridge similarity search speeds(?)

2014-05-08 Thread Markus Sitzmann
-- Forwarded message --
From: Markus Sitzmann 
Date: Thu, May 8, 2014 at 3:14 PM
Subject: Re: [Rdkit-discuss] RDKit cartridge similarity search speeds(?)
To: James Davidson 


Hi James,

I would guess, in your second query, "morganbv_fp('c1nnccc1'::mol, 2)"
has to be calculated for each row you are scanning because from the
database's perspective the result is unpredictable (although it is
not), so it can not be optimized so easily. All of this is avoided in
your first query, the calculation is done once before the table scan
and then the actual index/table scan is a rather simple one.

Markus

On Thu, May 8, 2014 at 2:35 PM, James Davidson  wrote:
> Dear All,
>
>
>
> I have recently been spending a bit more time with the RDKit cartridge, and
> have what is probably a very naïve question…
>
> Having built some RDKit fingerprints for ChEMBL_18, I see the following
> behaviour (for clarification – ‘ecfp4_bv’ is the column in my rdk.fps table
> that has been generated using morganbv_fp(mol, 2)):
>
>
>
>
>
> chembl_18=# \timing on
>
> Timing is on.
>
>
>
> chembl_18=# set rdkit.tanimoto_threshold=0.5;
>
> SET
>
> Time: 0.167 ms
>
>
>
> chembl_18=# select chembl_id from rdk.fps where ecfp4_bv %
> morganbv_fp('c1nnccc1'::mol,2);
>
>   chembl_id
>
> -
>
> CHEMBL15719
>
> (1 row)
>
>
>
> Time: 2033.348 ms
>
>
>
> chembl_18=# select chembl_id from rdk.fps where tanimoto_sml(ecfp4_bv,
> morganbv_fp('c1nnccc1'::mol, 2)) > 0.5;
>
>   chembl_id
>
> -
>
> CHEMBL15719
>
> (1 row)
>
>
>
> Time: 6843.605 ms
>
>
>
>
>
> I can see that the query plans are different in the two cases, but I don’t
> fully understand why – see below:
>
>
>
> QUERY 1 (with explain analyze)
>
> chembl_18=# explain analyze select chembl_id from rdk.fps where ecfp4_bv %
> morganbv_fp('c1nnccc1'::mol,2);
>
>
> QUERY PLAN
>
> 
>
> Bitmap Heap Scan on fps  (cost=106.91..5298.31 rows=1352 width=13) (actual
> time=1774.986..1774.987 rows=1 loops=1)
>
>Recheck Cond: (ecfp4_bv %
> '\x0100084200048204'::bfp)
>
>->  Bitmap Index Scan on fps_ecfp4bv_idx  (cost=0.00..106.57 rows=1352
> width=0) (actual time=1774.969..1774.969 rows=1 loops=1)
>
>  Index Cond: (ecfp4_bv %
> '\x0100084200048204'::bfp)
>
> Total runtime: 1775.035 ms
>
> (5 rows)
>
>
>
> Time: 1776.133 ms
>
>
>
>
>
> QUERY 2 (with explain analyze)
>
> chembl_18=# explain analyze select chembl_id from rdk.fps where
> tanimoto_sml(ecfp4_bv, morganbv_fp('c1nnccc1'::mol, 2)) > 0.5;
>
>
> QUERY PLAN
>
> ---
>
> Seq Scan on fps  (cost=0.00..388808.17 rows=450793 width=13) (actual
> time=1278.115..6953.977 rows=1 loops=1)
>
>Filter: (tanimoto_sml(ecfp4_bv,
> '\x0100084200048204'::bfp)
>> 0.5::double precision)
>
>Rows Removed by Filter: 1352377
>
> Total runtime: 6954.010 ms
>
> (4 rows)
>
>
>
> Time: 6955.103 ms
>
>
>
>
>
> It seems conceptually ‘easier’ to add the similarity value as part of the
> query, rather than setting it as a variable ahead of the query; but clearly
> I should be doing it the latter way for performance reasons.  So even if I
> don’t fully understand why at the moment, am I correct in thinking that
> queries of this sort should always be run with the similarity operators (%,
> #)?  And if so, is the rdkit.tanimoto_threshold variable set at the level of
> the session, the user, or the database?
>
>
>
> Kind regards
>
>
>
> James
>
>
> __
> PLEASE READ: This email is confidential and may be privileged. It is
> intended for the named addressee(s) only and access to it by anyone else is
> unauthorised. If you are not an addressee, any disclosure or copying of the
> contents of this email or any action taken (or not taken) in reliance on it
> is unauthorised and may be unlawful. If you have received this email in
> error, please notify the sender or postmas...@vernalis.com. Email is not a
> secure method of communication and the Company cannot accept responsibility
> for the accuracy or completeness of this message or any attachment(s).
> Please check this email for virus infection for which the Company accepts no
> responsibility. If verification of this email is sought then please request
> a hard copy. Unless otherwise stated, any views or opinions presented are
>

[Rdkit-discuss] Tautomeric InChIs

2014-05-08 Thread Edward Pyzer-Knapp
Hi all,

I have been playing around with RDKIT for a while now - great work guys!

I have recently hit an issue when using InChIs:

When generating both inchi and smiles from a rdkit Mol, I get two different
structures, even if I use the smiles as an input for the inchi generation.

An example:

smiles = "[H]N1C(=O)C(=C2C(=O)c3c(Cl)sc(F)c3N2[H])c2sc(F)c(Cl)c21" (I
should add this smiles was generated by RDKIT, from a Mol file)

mol = MolFromSmiles(smiles)
inchi = MolToInchi(mol)

print inchi
InChI=1S/C12H2Cl2F2N2O2S2/c13-3-6-8(21-10(3)15)2(12(20)18-6)4-7(19)1-5(17-4)11(16)22-9(1)14/h17H,(H,18,20)

when comparing the smiles and the inchi, the C=O has changed to an OH and a
C-N-H  has changed to a C=N.  I realise that these are tautomers of each
other, but surely the tautomeric form should be preserved when
interchanging smiles to inchi? Since at the moment, going
Smiles->Inchi->Smiles does NOT result in the original smiles...

There is a layer in the INCHI standard which would allow description of
this, is there a way to turn that on?

Many Thanks,

Ed Pyzer-Knapp
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MCS-based similarity in carbohydrates

2014-05-08 Thread Sushil Mishra
Hi Andrew,

Thanks a lot for the suggestions. MCS search return I am getting is similar
what you  have said:
'[#6](-!@[#6]-!@[#6]-@1(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6]-!@[#6])(-!@[#6])-@[#6](-!@[#6](-!@[#6])(-!@[#6])-!@[#6])(-!@[#6])-@[#6]-@1)(-!@[#6])(-!@[#6])-!@[#6]'


Your idea of using atomCompare = 'isotopes' seem to be good and I will give
a try..

I am using this approach to prepare input structures from Free Energy
Perturbation calculations. Thus , if I have structure A and B I would like
to generate a starting structures for calculations which should contain MCS
atoms plus rest of the atoms of A ( only will be changed to dummy during
calculations) + rest of the atoms of B as dummy atoms (will be dummy in
input structure and changed to their respective atoms during calculations).
There can be another structure for perturbing B into A.

I will see if "isotopes" can solve the problem.

Thanks
Sushil



On Thu, May 8, 2014 at 1:04 PM, Andrew Dalke wrote:

> Hi Sushil,
>
> On May 8, 2014, at 12:26 PM, Sushil Mishra wrote:
> > MCS algorithm seems to me unable to handle chiral carbons and it can not
> differentiate chiral changes in ligands.
>
> That's correct. The MCS algorithm in RDKit doesn't consider chirality.
> While in principle I think it would be possible to extend the current
> algorithm to support it, it would require some extensive changes.
>
> > Moreover, it also fails to differentiate between position of atoms in
> symmetrical positions.For example I have 6 atoms ring (C1, C2, C3, C4, C5,
> O5) with one -CH3 at C1 and another structure with -CH3 at C5. MCS can not
> differentiate such structures.
>
> The MCS search returns a SMARTS pattern, in this case something like:
>
>   [#6]1~[#6]~[#6]~[#6]~[#6]~([~6])~[#8]1
>
> I don't think there's any way for a SMARTS, or a least a non-recursive
> SMARTS, to handle those other than symmetrically.
>
> It may be possible, through isotope labeling, for you to define your own
> atom classes, so that the C1 atom in one carbohydrate can only ever match
> the C1 atom in another.
>
> What would you like for it to return instead, in order to get the
> information you need?
>
> Cheers,
>
> Andrew
> da...@dalkescientific.com
>
>
>
> --
> Is your legacy SCM system holding you back? Join Perforce May 7 to find
> out:
> • 3 signs your SCM is hindering your productivity
> • Requirements for releasing software faster
> • Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>



-- 
==
  Sushil Kumar Mishra, PhD
--
  CEITEC- Central European Institute of Technology. MU
  ILBIT, Building A4 - 2.12
  Kamenice 5, Brno 625 00
  Czech Republic
--
  Email : sus...@chemi.muni.cz
  Phone : +420-549 496 307
==
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit cartridge similarity search speeds(?)

2014-05-08 Thread James Davidson
Dear All,

I have recently been spending a bit more time with the RDKit cartridge, and 
have what is probably a very naïve question...
Having built some RDKit fingerprints for ChEMBL_18, I see the following 
behaviour (for clarification - 'ecfp4_bv' is the column in my rdk.fps table 
that has been generated using morganbv_fp(mol, 2)):


chembl_18=# \timing on
Timing is on.

chembl_18=# set rdkit.tanimoto_threshold=0.5;
SET
Time: 0.167 ms

chembl_18=# select chembl_id from rdk.fps where ecfp4_bv % 
morganbv_fp('c1nnccc1'::mol,2);
  chembl_id
-
CHEMBL15719
(1 row)

Time: 2033.348 ms

chembl_18=# select chembl_id from rdk.fps where tanimoto_sml(ecfp4_bv, 
morganbv_fp('c1nnccc1'::mol, 2)) > 0.5;
  chembl_id
-
CHEMBL15719
(1 row)

Time: 6843.605 ms


I can see that the query plans are different in the two cases, but I don't 
fully understand why - see below:

QUERY 1 (with explain analyze)
chembl_18=# explain analyze select chembl_id from rdk.fps where ecfp4_bv % 
morganbv_fp('c1nnccc1'::mol,2);

 QUERY PLAN

Bitmap Heap Scan on fps  (cost=106.91..5298.31 rows=1352 width=13) (actual 
time=1774.986..1774.987 rows=1 loops=1)
   Recheck Cond: (ecfp4_bv % 
'\x0100084200048204'::bfp)
   ->  Bitmap Index Scan on fps_ecfp4bv_idx  (cost=0.00..106.57 rows=1352 
width=0) (actual time=1774.969..1774.969 rows=1 loops=1)
 Index Cond: (ecfp4_bv % 
'\x0100084200048204'::bfp)
Total runtime: 1775.035 ms
(5 rows)

Time: 1776.133 ms


QUERY 2 (with explain analyze)
chembl_18=# explain analyze select chembl_id from rdk.fps where 
tanimoto_sml(ecfp4_bv, morganbv_fp('c1nnccc1'::mol, 2)) > 0.5;

  QUERY PLAN
---
Seq Scan on fps  (cost=0.00..388808.17 rows=450793 width=13) (actual 
time=1278.115..6953.977 rows=1 loops=1)
   Filter: (tanimoto_sml(ecfp4_bv, 
'\x0100084200048204'::bfp)
 > 0.5::double precision)
   Rows Removed by Filter: 1352377
Total runtime: 6954.010 ms
(4 rows)

Time: 6955.103 ms


It seems conceptually 'easier' to add the similarity value as part of the 
query, rather than setting it as a variable ahead of the query; but clearly I 
should be doing it the latter way for performance reasons.  So even if I don't 
fully understand why at the moment, am I correct in thinking that queries of 
this sort should always be run with the similarity operators (%, #)?  And if 
so, is the rdkit.tanimoto_threshold variable set at the level of the session, 
the user, or the database?

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migratin

Re: [Rdkit-discuss] MCS-based similarity in carbohydrates

2014-05-08 Thread Andrew Dalke
Hi Sushil,

On May 8, 2014, at 12:26 PM, Sushil Mishra wrote:
> MCS algorithm seems to me unable to handle chiral carbons and it can not 
> differentiate chiral changes in ligands. 

That's correct. The MCS algorithm in RDKit doesn't consider chirality. While in 
principle I think it would be possible to extend the current algorithm to 
support it, it would require some extensive changes.

> Moreover, it also fails to differentiate between position of atoms in 
> symmetrical positions.For example I have 6 atoms ring (C1, C2, C3, C4, C5, 
> O5) with one -CH3 at C1 and another structure with -CH3 at C5. MCS can not 
> differentiate such structures. 

The MCS search returns a SMARTS pattern, in this case something like:

  [#6]1~[#6]~[#6]~[#6]~[#6]~([~6])~[#8]1

I don't think there's any way for a SMARTS, or a least a non-recursive SMARTS, 
to handle those other than symmetrically.

It may be possible, through isotope labeling, for you to define your own atom 
classes, so that the C1 atom in one carbohydrate can only ever match the C1 
atom in another.

What would you like for it to return instead, in order to get the information 
you need?

Cheers,

Andrew
da...@dalkescientific.com


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MCS-based similarity in carbohydrates

2014-05-08 Thread Sushil Mishra
Dear All,

I am trying to calculate MCS-based similarity between monosaccharides.
Thus, there are different positions of -H, -OH or -CH3 ( up/below the
plane) in the monosaccharides I would like to compare. MCS algorithm seems
to me unable to handle chiral carbons and it can not differentiate chiral
changes in ligands.

Moreover, it also fails to differentiate between position of atoms in
symmetrical positions.For example I have 6 atoms ring (C1, C2, C3, C4, C5,
O5) with one -CH3 at C1 and another structure with -CH3 at C5. MCS can not
differentiate such structures.

Is there some way or workaround to get  correct common substructure ?

Thanking you
Sushil
--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

2014-05-08 Thread Stephen O'hagan
It appears that Eclipse PyDev code completion and syntax colouring was fooling 
me!

Get3DDistanceMatrix is  flagged as “undefined”, but code runs just fine!?

Cheers,
Steve.

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 08 May 2014 02:52
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

Hmm, it is definitely there.
If you built from source and are using the new build it should be available as: 
Chem.Get3DDistanceMatrix()

-greg


On Wed, May 7, 2014 at 3:48 PM, Stephen O'hagan 
mailto:soha...@manchester.ac.uk>> wrote:
I still don’t see it in the beta of the Q1 2014 release?

From: Greg Landrum 
[mailto:greg.land...@gmail.com]
Sent: 02 May 2014 15:00
To: Stephen O'hagan
Cc: 
rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

I can find no Get3DDistanceMatrix defined?

It is, unfortunately, a new feature. It's in the github version of the rdkit 
and will be in the next release (available next week).



--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss