Chris, Absolutely agree with your points - processing the molecules into RDkit 
is much more robust, but it depends though on how many you’ve got to process.  
If you’re doing millions to billions, then the overhead can become a problem 
and doing it in two steps (lexical then graph) can be the pragmatic solution.

Desalting by removing the smallest fragment performs as expected - 
pyrollidinium tosylate - which part of the salt do you want to discard? If you 
don’t know it’s hard to create a heuristic.

C1CC[NH2+]C1.Cc1ccc(cc1)S([O-])(=O)=O

Ed


Dr Ed Griffen,
Technical Director,
mobile  +44 7762 121593
office  +44 1625 238843
ed.grif...@medchemica.com
www.medchemica.com
skype: ed.griffen
Twitter: @MedChemica
Medchemica Ltd is a company registered in England and Wales with company number 
8162245.

Confidentiality Notice: This message is private and may contain confidential, 
proprietary and legally privileged information. If you have received this 
message in error, please notify us and remove it from your system and note that 
you must not copy, distribute or take any action in reliance on it. Any 
unauthorised use or disclosure of the contents of this message is not permitted 
and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery 
and unauthorised alterations. Therefore, information expressed in this message 
is not given or endorsed by MedChemica Limited unless otherwise notified by an 
authorised representative independent of this message. No contractual 
relationship is created by this message by any person unless specifically 
indicated by agreement in writing other than email.
Monitoring: MedChemica Limited retains and monitors all email traffic data and 
content for the purposes of the prevention and detection of crime, ensuring the 
security of our computer systems and checking compliance with our policies.

> On 29 Jun 2018, at 11:59, Chris Earnshaw <cgearns...@gmail.com> wrote:
> 
> I'd say that using RDkit to calculate the numbers of heavy atoms is 
> significantly more robust than a purely lexical approach - and it's easy to 
> implement.
> 
> It's also dangerous to just discard the smallest fragment. Years ago I worked 
> on a project where the active molecule had only 11 heavy atoms and the 
> counterion (dicyclohexylamine) had 13 - so relying on atom counts is a way to 
> sometimes throw the baby out with the bath water. It's much safer (but also a 
> lot more work) to build a desalter/desolvater that explicitly removes just 
> the fragments you really want to remove.
> 
> Best regards,
> Chris
> 
> On 29 June 2018 at 09:56, Ed Griffen <ed.grif...@medchemica.com 
> <mailto:ed.grif...@medchemica.com>> wrote:
> Using the string length to find the number of atoms in a molecule is OK - but 
> you need to take account of the additional characters in SMILES that are not 
> just atoms, for example:
> 
> two letter elements - like silicon, chlorine etc
> brackets , ring closures, charges, explicit hydrogens
> 
> It’s simple to do:
> 
> Here’s a worked example:
> 
> >>> SMILES = 'C[S@@+]([O-])c1ccc(cc1)[Si](C)(C)C'
> >>> print(len(SMILES))
> 34
> >>> heavies = [char for char in SMILES if char not in 
> >>> '''()[]1234567890#:;,.?%-=+\/Hherlabdgfikmputvy@''']
> >>> print(len(heavies))
> 13
> 
> obviously you do this after splitting on the . 
> 
> Best regards,
> 
> Ed
> 
> Dr Ed Griffen,
> Technical Director,
> mobile        +44 7762 121593
> office        +44 1625 238843
> ed.grif...@medchemica.com <mailto:ed.grif...@medchemica.com>
> www.medchemica.com <http://www.medchemica.com/>
> skype: ed.griffen
> Twitter: @MedChemica
> Medchemica Ltd is a company registered in England and Wales with company 
> number 8162245.
> 
> Confidentiality Notice: This message is private and may contain confidential, 
> proprietary and legally privileged information. If you have received this 
> message in error, please notify us and remove it from your system and note 
> that you must not copy, distribute or take any action in reliance on it. Any 
> unauthorised use or disclosure of the contents of this message is not 
> permitted and may be unlawful.
> Disclaimer: Email messages may be subject to delays, interception, 
> non-delivery and unauthorised alterations. Therefore, information expressed 
> in this message is not given or endorsed by MedChemica Limited unless 
> otherwise notified by an authorised representative independent of this 
> message. No contractual relationship is created by this message by any person 
> unless specifically indicated by agreement in writing other than email.
> Monitoring: MedChemica Limited retains and monitors all email traffic data 
> and content for the purposes of the prevention and detection of crime, 
> ensuring the security of our computer systems and checking compliance with 
> our policies.
> 
>> On 29 Jun 2018, at 06:37, Alfredo Quevedo <maquevedo....@gmail.com 
>> <mailto:maquevedo....@gmail.com>> wrote:
>> 
>> thank you Hideyoshi for your feedback. 
>> regards
>> Alfredo
>> 
>> Enviado desde BlueMail <http://www.bluemail.me/r?b=13187>
>> En 28 de junio de 2018, en 21:43, "藤秀義" <hideyoshif...@gmail.com 
>> <mailto:hideyoshif...@gmail.com>> escribió:
>> Dear Alfredo,
>> 
>> Although not strictly based on the number of atoms, but on the length of 
>> SMILES string, the simplest way is using Python built-in functions as 
>> follows:
>> 
>> smiles = 'CCC.CC'
>> fragment = max(smiles.split('.'), key=len)
>> print (fragment)
>> 
>> Best regards,
>> 
>> Hideyoshi
>> 
>> 
>> thank you Paolo for this help, I will study the code and try it,
>> best regards
>> 
>> Alfredo
>> 
>> Enviado desde BlueMail <http://www.bluemail.me/r?b=13187>
>> En 28 de junio de 2018, en 17:08, Paolo Tosco <paolo.tosco.m...@gmail.com 
>> <mailto:paolo.tosco.m...@gmail.com>> escribió:
>> 
>> Dear Alfredo,
>> 
>> if you wish to keep only the largest disconnected fragment you may try 
>> the following:
>> 
>> mols = list(rdmolops.GetMolFrags(mol, asMols = True))
>> if (mols):
>>      mols.sort(reverse = True, key = lambda m: m.GetNumAtoms())
>>      mol = mols[0]
>> 
>> Hope that helps, cheers
>> p.
>> 
>> On 06/28/18 19:38, Alfredo Quevedo wrote:
>>  Good afternoon,
>> 
>>  I would like to filter out small fragments from a list of molecules 
>>  using the below strategy:
>> 
>>  from rdkit import Chem
>>  from rdkit.Chem import AllChem
>>  from rdkit.Chem import SaltRemover fragment
>> 
>>  remover=SaltRemover.SaltRemover()
>>  mol=Chem.MolFromSmiles('CCC.CC')
>>  res=remover.StripMol(mol)
>>  print(res.GetNumAtoms())
>> 
>> 
>>  I am getting 5 atoms as output, so the ´CC´ is not being stripped (the 
>>  script workd ok for salts). Is there any way of filtering non salts 
>>  small fragments?
>> 
>>  thank you very much in advance,
>> 
>>  regards,
>> 
>>  Alfredo
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>  Check out the vibrant tech community on one of the world's most
>>  engaging tech sites, Slashdot.org <http://slashdot.org/>! 
>> http://sdm.link/slashdot <http://sdm.link/slashdot>
>> 
>>  Rdkit-discuss mailing list
>>  Rdkit-discuss@lists.sourceforge.net 
>> <mailto:Rdkit-discuss@lists.sourceforge.net>
>>  https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
>> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
>>  
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>! 
>> http://sdm.link/slashdot_______________________________________________ 
>> <http://sdm.link/slashdot_______________________________________________>
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net 
>> <mailto:Rdkit-discuss@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
>> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot 
> <http://sdm.link/slashdot>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net 
> <mailto:Rdkit-discuss@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss 
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
> 
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to