Re: [Rdkit-discuss] elimination of small fragments
Chris, Absolutely agree with your points - processing the molecules into RDkit is much more robust, but it depends though on how many you’ve got to process. If you’re doing millions to billions, then the overhead can become a problem and doing it in two steps (lexical then graph) can be the pragmatic solution. Desalting by removing the smallest fragment performs as expected - pyrollidinium tosylate - which part of the salt do you want to discard? If you don’t know it’s hard to create a heuristic. C1CC[NH2+]C1.Cc1ccc(cc1)S([O-])(=O)=O Ed Dr Ed Griffen, Technical Director, mobile +44 7762 121593 office +44 1625 238843 ed.grif...@medchemica.com www.medchemica.com skype: ed.griffen Twitter: @MedChemica Medchemica Ltd is a company registered in England and Wales with company number 8162245. Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by MedChemica Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. Monitoring: MedChemica Limited retains and monitors all email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking compliance with our policies. > On 29 Jun 2018, at 11:59, Chris Earnshaw wrote: > > I'd say that using RDkit to calculate the numbers of heavy atoms is > significantly more robust than a purely lexical approach - and it's easy to > implement. > > It's also dangerous to just discard the smallest fragment. Years ago I worked > on a project where the active molecule had only 11 heavy atoms and the > counterion (dicyclohexylamine) had 13 - so relying on atom counts is a way to > sometimes throw the baby out with the bath water. It's much safer (but also a > lot more work) to build a desalter/desolvater that explicitly removes just > the fragments you really want to remove. > > Best regards, > Chris > > On 29 June 2018 at 09:56, Ed Griffen <mailto:ed.grif...@medchemica.com>> wrote: > Using the string length to find the number of atoms in a molecule is OK - but > you need to take account of the additional characters in SMILES that are not > just atoms, for example: > > two letter elements - like silicon, chlorine etc > brackets , ring closures, charges, explicit hydrogens > > It’s simple to do: > > Here’s a worked example: > > >>> SMILES = 'C[S@@+]([O-])c1ccc(cc1)[Si](C)(C)C' > >>> print(len(SMILES)) > 34 > >>> heavies = [char for char in SMILES if char not in > >>> '''()[]1234567890#:;,.?%-=+\/Hherlabdgfikmputvy@'''] > >>> print(len(heavies)) > 13 > > obviously you do this after splitting on the . > > Best regards, > > Ed > > Dr Ed Griffen, > Technical Director, > mobile+44 7762 121593 > office+44 1625 238843 > ed.grif...@medchemica.com <mailto:ed.grif...@medchemica.com> > www.medchemica.com <http://www.medchemica.com/> > skype: ed.griffen > Twitter: @MedChemica > Medchemica Ltd is a company registered in England and Wales with company > number 8162245. > > Confidentiality Notice: This message is private and may contain confidential, > proprietary and legally privileged information. If you have received this > message in error, please notify us and remove it from your system and note > that you must not copy, distribute or take any action in reliance on it. Any > unauthorised use or disclosure of the contents of this message is not > permitted and may be unlawful. > Disclaimer: Email messages may be subject to delays, interception, > non-delivery and unauthorised alterations. Therefore, information expressed > in this message is not given or endorsed by MedChemica Limited unless > otherwise notified by an authorised representative independent of this > message. No contractual relationship is created by this message by any person > unless specifically indicated by agreement in writing other than email. > Monitoring: MedChemica Limited retains and monitors all email traffic data > and content for the
Re: [Rdkit-discuss] elimination of small fragments
Using the string length to find the number of atoms in a molecule is OK - but you need to take account of the additional characters in SMILES that are not just atoms, for example: two letter elements - like silicon, chlorine etc brackets , ring closures, charges, explicit hydrogens It’s simple to do: Here’s a worked example: >>> SMILES = 'C[S@@+]([O-])c1ccc(cc1)[Si](C)(C)C' >>> print(len(SMILES)) 34 >>> heavies = [char for char in SMILES if char not in >>> '''()[]1234567890#:;,.?%-=+\/Hherlabdgfikmputvy@'''] >>> print(len(heavies)) 13 obviously you do this after splitting on the . Best regards, Ed Dr Ed Griffen, Technical Director, mobile +44 7762 121593 office +44 1625 238843 ed.grif...@medchemica.com www.medchemica.com skype: ed.griffen Twitter: @MedChemica Medchemica Ltd is a company registered in England and Wales with company number 8162245. Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by MedChemica Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. Monitoring: MedChemica Limited retains and monitors all email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking compliance with our policies. > On 29 Jun 2018, at 06:37, Alfredo Quevedo wrote: > > thank you Hideyoshi for your feedback. > regards > Alfredo > > Enviado desde BlueMail <http://www.bluemail.me/r?b=13187> > En 28 de junio de 2018, en 21:43, "藤秀義" <mailto:hideyoshif...@gmail.com>> escribió: > Dear Alfredo, > > Although not strictly based on the number of atoms, but on the length of > SMILES string, the simplest way is using Python built-in functions as follows: > > smiles = 'CCC.CC' > fragment = max(smiles.split('.'), key=len) > print (fragment) > > Best regards, > > Hideyoshi > > > thank you Paolo for this help, I will study the code and try it, > best regards > > Alfredo > > Enviado desde BlueMail <http://www.bluemail.me/r?b=13187> > En 28 de junio de 2018, en 17:08, Paolo Tosco <mailto:paolo.tosco.m...@gmail.com>> escribió: > > Dear Alfredo, > > if you wish to keep only the largest disconnected fragment you may try > the following: > > mols = list(rdmolops.GetMolFrags(mol, asMols = True)) > if (mols): > mols.sort(reverse = True, key = lambda m: m.GetNumAtoms()) > mol = mols[0] > > Hope that helps, cheers > p. > > On 06/28/18 19:38, Alfredo Quevedo wrote: > Good afternoon, > > I would like to filter out small fragments from a list of molecules > using the below strategy: > > from rdkit import Chem > from rdkit.Chem import AllChem > from rdkit.Chem import SaltRemover fragment > > remover=SaltRemover.SaltRemover() > mol=Chem.MolFromSmiles('CCC.CC') > res=remover.StripMol(mol) > print(res.GetNumAtoms()) > > > I am getting 5 atoms as output, so the ´CC´ is not being stripped (the > script workd ok for salts). Is there any way of filtering non salts > small fragments? > > thank you very much in advance, > > regards, > > Alfredo > > > > > > > > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org <http://slashdot.org/>! > http://sdm.link/slashdot <http://sdm.link/slashdot> > > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > <mailto:Rdkit-discuss@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! > http://sdm.link/slashdot___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Find difference(s) between molecules
Jens, I think what you're looking for is matched molecular pair finding. What you have described I think is encoding a transformation that captures the local chemical environment. There are open source scripts for RDKit, and also proprietary methods. There are some sublties on all pair finding code which may need adjusting, as the configuration can make a significant difference. If you would like to discuss more, mail me directly, we have a huge amount of experience in the area. http://pubs.acs.org/doi/abs/10.1021/jm200452d http://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00335 Best regards, Ed Dr Ed Griffen, Technical Director, mobile +44 7762 121593 office +44 1625 238843 ed.grif...@medchemica.com www.medchemica.com skype: ed.griffen Twitter: @MedChemica Medchemica Ltd is a company registered in England and Wales with company number 8162245. Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by MedChemica Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. Monitoring: MedChemica Limited retains and monitors all email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking compliance with our policies. > On 3 Nov 2017, at 13:15, Jens Kristian Munk > wrote: > > Hi list, > > I’ve searched far and wide for an answer to this; I apologize if the answer > is obvious... > > I can use rdFMCS > (http://www.rdkit.org/Python_Docs/rdkit.Chem.rdFMCS-module.html > <http://www.rdkit.org/Python_Docs/rdkit.Chem.rdFMCS-module.html>) to find the > maximum common substructure of a set of molecules... But how do I find the > difference(s) between two (or more) molecules? > > I work with lipids a lot, so for example, the difference between palmitoic > acid (C16:0) and stearic acid (C18:0) is SMILES ‘CC’. I would like RDkit to > tell me just that, as well as tell me where on the maximum common > substructure (which in this example is palmitoic acid) to add the ‘CC’ to get > stearic acid – i.e. on the terminus of the fatty chain. > > Any ideas? > > The example above is just the first step. After that comes identifying and > locating double bonds in the fatty chains... And then jump to phospholipids, > with two fatty chains and a head group... J > > Med venlig hilsen > > Jens Kristian Munk > Kemiker, Cand. Scient., Ph.D. > > Telefon: 3862 0398 > Mobil: 5142 3483 > E-mail: jens.kristian.m...@regionh.dk <mailto:jens.kristian.m...@regionh.dk> > > Klinisk Biokemisk afdeling > Amager og Hvidovre hospital > Kettegård Allé 30 > 2650 Hvidovre > > Web: www.regionh.dk <http://www.regionh.dk/> > > > > > Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette > modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder > vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. > Samtidig bedes du slette e-mailen med det samme uden at videresende eller > kopiere den. > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org <http://slashdot.org/>! > http://sdm.link/slashdot___ > <http://sdm.link/slashdot___> > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > <mailto:Rdkit-discuss@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Depicting reactions to the same quality as molecules
Is there a reaction depiction option similar to the MolDraw2DCairo which produces much better depictions that the simple Chem.Draw PIL images? Or am I just doing this wrong? Attempting to push a reaction through MolDraw2DCairo fails with: Traceback (most recent call last): File "drawing_test.py", line 31, in rc = rdMolDraw2D.PrepareMolForDrawing(rxn) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.Draw.rdMolDraw2D.PrepareMolForDrawing(ChemicalReaction) did not match C++ signature: PrepareMolForDrawing(RDKit::ROMol const* mol, bool kekulize=True, bool addChiralHs=True, bool wedgeBonds=True, bool forceCoords=False) Cheers, Ed sample code below: from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import Draw from rdkit.Chem.Draw import rdMolDraw2D from rdkit.Chem import rdDepictor from rdkit.Chem.Draw import DrawingOptions m1 = AllChem.MolFromSmiles('c1c1N(C)C') tmp = AllChem.Compute2DCoords(m1) Draw.MolToFile(m1,'test_mol_image.png') rdDepictor.Compute2DCoords(m1) rxn = AllChem.ReactionFromSmarts('[C:1](=[O:2])[N:3]>>[N:1][C:3]=[O:2]') rimage = Draw.ReactionToImage(rxn) rimage.save('test_reaction_image.png') mc = rdMolDraw2D.PrepareMolForDrawing(m1) drawer = Draw.MolDraw2DCairo(300, 300) drawer.DrawMolecule(mc) drawer.FinishDrawing() output = drawer.GetDrawingText() with open('test_mol_image_2.png', 'wb') as pngf: pngf.write(output) drawer2 = Draw.MolDraw2DCairo(600, 300) rc = rdMolDraw2D.PrepareMolForDrawing(rxn) drawer2.DrawMolecule(rc) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss