Re: [ccp4bb] ChEBI and SMILES

Oliver Smart Fri, 07 Oct 2022 08:48:29 -0700

On 7 Oct 2022, at 10:12, Harry Powell <0000193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:

Hi

Probably a silly question, but I was wondering how to search for a ligand in ChEBI with a SMILES string? It’s not immediately obvious to my Friday-morning mind ...

Harry
########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/

Dear Harry,

Although ChEBI has an API this uses SOAP and I am not familiar with it

https://www.ebi.ac.uk/chebi/webServices.do

But PubChem has a great API system and pubchempy https://github.com/mcs07/PubChemPy provides an easy way to interact with it.

Pubchem entries appear to have the CHEBI identifier as one of there synonyms. The attached python script access this information.

For instance:

$ ./lookup_smiles_in_pubchem.py   "c1ccccc1O"                                                       

PubChem_CID:996 ChEBI_ID:15882 (phenol)

$./lookup_smiles_in_pubchem.py   "Cn1cnc2n(C)c(=O)n(C)c(=O)c12"
PubChem_CID:2519 ChEBI_ID:27732 (caffeine)

$./lookup_smiles_in_pubchem.py   "CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C"
PubChem_CID:135398744 ChEBI_ID:9139 (sildenafil)

Hope this is useful

Oliver

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

#!/usr/bin/env python
"""
little script to lookup a SMILES string at PubChem
and print out the CHEBI and ChEMBL identifiers
"""
import pubchempy as pcp
import argparse


# get smiles from command line
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('smiles', metavar='SMILES',
                    help='SMILES string to lookup')
args = parser.parse_args()
smiles = args.smiles

compound = pcp.get_compounds(smiles, 'smiles')[0]
if compound:
    cid = compound.cid
    synonyms = compound.synonyms
    chebi_ids = [s for s in synonyms if s.startswith('CHEBI:')]
    if chebi_ids:
        chebi_id = chebi_ids[0].replace('CHEBI:', '')
    else:
        chebi_id = None
    print(f'PubChem_CID:{cid} ChEBI_ID:{chebi_id} ({synonyms[0]})')
else:
    print('smiles string not found')

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

Re: [ccp4bb] ChEBI and SMILES

Reply via email to