Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Ed Griffen
Chris, Absolutely agree with your points - processing the molecules into RDkit 
is much more robust, but it depends though on how many you’ve got to process.  
If you’re doing millions to billions, then the overhead can become a problem 
and doing it in two steps (lexical then graph) can be the pragmatic solution.

Desalting by removing the smallest fragment performs as expected - 
pyrollidinium tosylate - which part of the salt do you want to discard? If you 
don’t know it’s hard to create a heuristic.

C1CC[NH2+]C1.Cc1ccc(cc1)S([O-])(=O)=O

Ed


Dr Ed Griffen,
Technical Director,
mobile  +44 7762 121593
office  +44 1625 238843
ed.grif...@medchemica.com
www.medchemica.com
skype: ed.griffen
Twitter: @MedChemica
Medchemica Ltd is a company registered in England and Wales with company number 
8162245.

Confidentiality Notice: This message is private and may contain confidential, 
proprietary and legally privileged information. If you have received this 
message in error, please notify us and remove it from your system and note that 
you must not copy, distribute or take any action in reliance on it. Any 
unauthorised use or disclosure of the contents of this message is not permitted 
and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery 
and unauthorised alterations. Therefore, information expressed in this message 
is not given or endorsed by MedChemica Limited unless otherwise notified by an 
authorised representative independent of this message. No contractual 
relationship is created by this message by any person unless specifically 
indicated by agreement in writing other than email.
Monitoring: MedChemica Limited retains and monitors all email traffic data and 
content for the purposes of the prevention and detection of crime, ensuring the 
security of our computer systems and checking compliance with our policies.

> On 29 Jun 2018, at 11:59, Chris Earnshaw  wrote:
> 
> I'd say that using RDkit to calculate the numbers of heavy atoms is 
> significantly more robust than a purely lexical approach - and it's easy to 
> implement.
> 
> It's also dangerous to just discard the smallest fragment. Years ago I worked 
> on a project where the active molecule had only 11 heavy atoms and the 
> counterion (dicyclohexylamine) had 13 - so relying on atom counts is a way to 
> sometimes throw the baby out with the bath water. It's much safer (but also a 
> lot more work) to build a desalter/desolvater that explicitly removes just 
> the fragments you really want to remove.
> 
> Best regards,
> Chris
> 
> On 29 June 2018 at 09:56, Ed Griffen  <mailto:ed.grif...@medchemica.com>> wrote:
> Using the string length to find the number of atoms in a molecule is OK - but 
> you need to take account of the additional characters in SMILES that are not 
> just atoms, for example:
> 
> two letter elements - like silicon, chlorine etc
> brackets , ring closures, charges, explicit hydrogens
> 
> It’s simple to do:
> 
> Here’s a worked example:
> 
> >>> SMILES = 'C[S@@+]([O-])c1ccc(cc1)[Si](C)(C)C'
> >>> print(len(SMILES))
> 34
> >>> heavies = [char for char in SMILES if char not in 
> >>> '''()[]1234567890#:;,.?%-=+\/Hherlabdgfikmputvy@''']
> >>> print(len(heavies))
> 13
> 
> obviously you do this after splitting on the . 
> 
> Best regards,
> 
> Ed
> 
> Dr Ed Griffen,
> Technical Director,
> mobile+44 7762 121593
> office+44 1625 238843
> ed.grif...@medchemica.com <mailto:ed.grif...@medchemica.com>
> www.medchemica.com <http://www.medchemica.com/>
> skype: ed.griffen
> Twitter: @MedChemica
> Medchemica Ltd is a company registered in England and Wales with company 
> number 8162245.
> 
> Confidentiality Notice: This message is private and may contain confidential, 
> proprietary and legally privileged information. If you have received this 
> message in error, please notify us and remove it from your system and note 
> that you must not copy, distribute or take any action in reliance on it. Any 
> unauthorised use or disclosure of the contents of this message is not 
> permitted and may be unlawful.
> Disclaimer: Email messages may be subject to delays, interception, 
> non-delivery and unauthorised alterations. Therefore, information expressed 
> in this message is not given or endorsed by MedChemica Limited unless 
> otherwise notified by an authorised representative independent of this 
> message. No contractual relationship is created by this message by any person 
> unless specifically indicated by agreement in writing other than email.
> Monitoring: MedChemica Limited retains and monitors all email traffic data 
> and content for the

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Ed Griffen
Using the string length to find the number of atoms in a molecule is OK - but 
you need to take account of the additional characters in SMILES that are not 
just atoms, for example:

two letter elements - like silicon, chlorine etc
brackets , ring closures, charges, explicit hydrogens

It’s simple to do:

Here’s a worked example:

>>> SMILES = 'C[S@@+]([O-])c1ccc(cc1)[Si](C)(C)C'
>>> print(len(SMILES))
34
>>> heavies = [char for char in SMILES if char not in 
>>> '''()[]1234567890#:;,.?%-=+\/Hherlabdgfikmputvy@''']
>>> print(len(heavies))
13

obviously you do this after splitting on the . 

Best regards,

Ed

Dr Ed Griffen,
Technical Director,
mobile  +44 7762 121593
office  +44 1625 238843
ed.grif...@medchemica.com
www.medchemica.com
skype: ed.griffen
Twitter: @MedChemica
Medchemica Ltd is a company registered in England and Wales with company number 
8162245.

Confidentiality Notice: This message is private and may contain confidential, 
proprietary and legally privileged information. If you have received this 
message in error, please notify us and remove it from your system and note that 
you must not copy, distribute or take any action in reliance on it. Any 
unauthorised use or disclosure of the contents of this message is not permitted 
and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery 
and unauthorised alterations. Therefore, information expressed in this message 
is not given or endorsed by MedChemica Limited unless otherwise notified by an 
authorised representative independent of this message. No contractual 
relationship is created by this message by any person unless specifically 
indicated by agreement in writing other than email.
Monitoring: MedChemica Limited retains and monitors all email traffic data and 
content for the purposes of the prevention and detection of crime, ensuring the 
security of our computer systems and checking compliance with our policies.

> On 29 Jun 2018, at 06:37, Alfredo Quevedo  wrote:
> 
> thank you Hideyoshi for your feedback. 
> regards
> Alfredo
> 
> Enviado desde BlueMail <http://www.bluemail.me/r?b=13187>
> En 28 de junio de 2018, en 21:43, "藤秀義"  <mailto:hideyoshif...@gmail.com>> escribió:
> Dear Alfredo,
> 
> Although not strictly based on the number of atoms, but on the length of 
> SMILES string, the simplest way is using Python built-in functions as follows:
> 
> smiles = 'CCC.CC'
> fragment = max(smiles.split('.'), key=len)
> print (fragment)
> 
> Best regards,
> 
> Hideyoshi
> 
> 
> thank you Paolo for this help, I will study the code and try it,
> best regards
> 
> Alfredo
> 
> Enviado desde BlueMail <http://www.bluemail.me/r?b=13187>
> En 28 de junio de 2018, en 17:08, Paolo Tosco  <mailto:paolo.tosco.m...@gmail.com>> escribió:
> 
> Dear Alfredo,
> 
> if you wish to keep only the largest disconnected fragment you may try 
> the following:
> 
> mols = list(rdmolops.GetMolFrags(mol, asMols = True))
> if (mols):
>  mols.sort(reverse = True, key = lambda m: m.GetNumAtoms())
>  mol = mols[0]
> 
> Hope that helps, cheers
> p.
> 
> On 06/28/18 19:38, Alfredo Quevedo wrote:
>  Good afternoon,
> 
>  I would like to filter out small fragments from a list of molecules 
>  using the below strategy:
> 
>  from rdkit import Chem
>  from rdkit.Chem import AllChem
>  from rdkit.Chem import SaltRemover fragment
> 
>  remover=SaltRemover.SaltRemover()
>  mol=Chem.MolFromSmiles('CCC.CC')
>  res=remover.StripMol(mol)
>  print(res.GetNumAtoms())
> 
> 
>  I am getting 5 atoms as output, so the ´CC´ is not being stripped (the 
>  script workd ok for salts). Is there any way of filtering non salts 
>  small fragments?
> 
>  thank you very much in advance,
> 
>  regards,
> 
>  Alfredo
> 
> 
> 
> 
> 
> 
> 
>  Check out the vibrant tech community on one of the world's most
>  engaging tech sites, Slashdot.org <http://slashdot.org/>! 
> http://sdm.link/slashdot <http://sdm.link/slashdot>
> 
>  Rdkit-discuss mailing list
>  Rdkit-discuss@lists.sourceforge.net 
> <mailto:Rdkit-discuss@lists.sourceforge.net>
>  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
>  
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! 
> http://sdm.link/slashdot___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Find difference(s) between molecules

2017-11-03 Thread Ed Griffen
Jens,

I think what you're looking for is matched molecular pair finding.  What you 
have described I think is encoding a transformation that captures the local 
chemical environment.

There are open source scripts for RDKit, and also proprietary methods. 

There are some sublties on all pair finding code which may need adjusting, as 
the configuration can make a significant difference.

If you would like to discuss more, mail me directly, we have a huge amount of 
experience in the area.

http://pubs.acs.org/doi/abs/10.1021/jm200452d

http://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00335

Best regards,

Ed




Dr Ed Griffen,
Technical Director,
mobile  +44 7762 121593
office  +44 1625 238843
ed.grif...@medchemica.com
www.medchemica.com
skype: ed.griffen
Twitter: @MedChemica
Medchemica Ltd is a company registered in England and Wales with company number 
8162245.

Confidentiality Notice: This message is private and may contain confidential, 
proprietary and legally privileged information. If you have received this 
message in error, please notify us and remove it from your system and note that 
you must not copy, distribute or take any action in reliance on it. Any 
unauthorised use or disclosure of the contents of this message is not permitted 
and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery 
and unauthorised alterations. Therefore, information expressed in this message 
is not given or endorsed by MedChemica Limited unless otherwise notified by an 
authorised representative independent of this message. No contractual 
relationship is created by this message by any person unless specifically 
indicated by agreement in writing other than email.
Monitoring: MedChemica Limited retains and monitors all email traffic data and 
content for the purposes of the prevention and detection of crime, ensuring the 
security of our computer systems and checking compliance with our policies.

> On 3 Nov 2017, at 13:15, Jens Kristian Munk  
> wrote:
> 
> Hi list,
>  
> I’ve searched far and wide for an answer to this; I apologize if the answer 
> is obvious...
>  
> I can use rdFMCS 
> (http://www.rdkit.org/Python_Docs/rdkit.Chem.rdFMCS-module.html 
> <http://www.rdkit.org/Python_Docs/rdkit.Chem.rdFMCS-module.html>) to find the 
> maximum common substructure of a set of molecules... But how do I find the 
> difference(s) between two (or more) molecules?
>  
> I work with lipids a lot, so for example, the difference between palmitoic 
> acid (C16:0) and stearic acid (C18:0) is SMILES ‘CC’. I would like RDkit to 
> tell me just that, as well as tell me where on the maximum common 
> substructure (which in this example is palmitoic acid) to add the ‘CC’ to get 
> stearic acid – i.e. on the terminus of the fatty chain.
>  
> Any ideas?
>  
> The example above is just the first step. After that comes identifying and 
> locating double bonds in the fatty chains... And then jump to phospholipids, 
> with two fatty chains and a head group... J
>  
> Med venlig hilsen
>  
> Jens Kristian Munk
> Kemiker, Cand. Scient., Ph.D.
>  
> Telefon: 3862 0398
> Mobil: 5142 3483
> E-mail: jens.kristian.m...@regionh.dk <mailto:jens.kristian.m...@regionh.dk>
>  
> Klinisk Biokemisk afdeling
> Amager og Hvidovre hospital
> Kettegård Allé 30
> 2650 Hvidovre
> 
> Web: www.regionh.dk <http://www.regionh.dk/>
>  
> 
> 
> 
> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette 
> modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder 
> vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. 
> Samtidig bedes du slette e-mailen med det samme uden at videresende eller 
> kopiere den.
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>! 
> http://sdm.link/slashdot___ 
> <http://sdm.link/slashdot___>
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net 
> <mailto:Rdkit-discuss@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Depicting reactions to the same quality as molecules

2017-05-19 Thread Ed Griffen
Is there a reaction depiction option similar to the MolDraw2DCairo  which 
produces much better depictions that the simple Chem.Draw PIL images?

Or am I just doing this wrong?


Attempting to push a reaction through MolDraw2DCairo fails with:

Traceback (most recent call last):
  File "drawing_test.py", line 31, in 
rc = rdMolDraw2D.PrepareMolForDrawing(rxn)
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.Draw.rdMolDraw2D.PrepareMolForDrawing(ChemicalReaction)
did not match C++ signature:
PrepareMolForDrawing(RDKit::ROMol const* mol, bool kekulize=True, bool 
addChiralHs=True, bool wedgeBonds=True, bool forceCoords=False)

Cheers,

Ed


sample code below:


from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import DrawingOptions

m1 = AllChem.MolFromSmiles('c1c1N(C)C')
tmp = AllChem.Compute2DCoords(m1)
Draw.MolToFile(m1,'test_mol_image.png')
rdDepictor.Compute2DCoords(m1)

rxn = AllChem.ReactionFromSmarts('[C:1](=[O:2])[N:3]>>[N:1][C:3]=[O:2]')
rimage = Draw.ReactionToImage(rxn)
rimage.save('test_reaction_image.png')

mc = rdMolDraw2D.PrepareMolForDrawing(m1)
drawer = Draw.MolDraw2DCairo(300, 300)
drawer.DrawMolecule(mc)
drawer.FinishDrawing()
output = drawer.GetDrawingText()
with open('test_mol_image_2.png', 'wb') as pngf:
pngf.write(output)


drawer2 = Draw.MolDraw2DCairo(600, 300)
rc = rdMolDraw2D.PrepareMolForDrawing(rxn)
drawer2.DrawMolecule(rc)





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss