Re: [Cdk-user] Mol2 file to SMILES

John Mayfield Sun, 28 Mar 2021 04:42:06 -0700

You should use the *CircularFingerprinter* for similarity.

On Sun, 28 Mar 2021 at 08:39, Sub Jae Shin <[email protected]> wrote:


> To John Mayfield
>
> Hi, I found the drugbank id property from AtomContainer's getproperties
> method, so that I could specify which atom container indicates which drug.
>
> I think my goal to get drug-drug similarity has been achieved in my guess.
>
> package com.company;
> import org.openscience.cdk.ChemFile;
> import org.openscience.cdk.exception.CDKException;
> import org.openscience.cdk.fingerprint.Fingerprinter;
> import org.openscience.cdk.fingerprint.IBitFingerprint;
> import org.openscience.cdk.fingerprint.IFingerprinter;
> import org.openscience.cdk.graph.rebond.Bspt;
> import org.openscience.cdk.interfaces.IAtomContainer;
> import org.openscience.cdk.interfaces.IChemFile;
> import org.openscience.cdk.io.MDLV2000Reader;
> import org.openscience.cdk.similarity.Tanimoto;
> import org.openscience.cdk.tools.manipulator.ChemFileManipulator;
>
> import java.io.*;
> import java.lang.reflect.Array;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Map;
>
> public class Main {
>
>     public static void main(String[] args) {
>         try {
>
>             InputStream structures = new 
> FileInputStream("../data/drugbank/structures.sdf");
>             MDLV2000Reader reader = new MDLV2000Reader(structures);
>             IChemFile file = reader.read(new ChemFile());
>             //Where can I find drugbank id?
>
>             Fingerprinter finger = new Fingerprinter();
>             List<IAtomContainer> AtomData = 
> ChemFileManipulator.getAllAtomContainers(file);
>             int count = AtomData.size();
>             ArrayList<ArrayList> df = new ArrayList<>();
>
>             for(int i = 0; i < count; ++i) {
>                 ArrayList<Object> list = new ArrayList<>();
>                 IAtomContainer acReference = AtomData.get(i);
>                 Map refProperties = acReference.getProperties();
>                 list.add(refProperties.get("DATABASE_ID"));
>                 for(int j = 0; j < count; ++j) {
>                     IAtomContainer acStructure = AtomData.get(j);
>                     Map structProperties = acStructure.getProperties();
>                     System.out.println("REF DATABASE_ID : " + 
> refProperties.get("DATABASE_ID") +
>                             "-" + "COMP DATABASE_ID" + 
> structProperties.get("DATABASE_ID") + " similarity is now calculating....");
>                     double similarity = cdkCalculateTanimotoCoef(finger, 
> acReference, acStructure);
>                     list.add(similarity);
>                 }
>                 df.add(list);
>             }
>             FileWriter result_csv = new 
> FileWriter("../data/drugbank/drug_drug_sim.csv");
>
>             for(ArrayList a : df){
>                 String row = "";
>                 for(int i = 0; i < a.size(); ++i) {
>                     if(i == a.size() - 1) {
>                         row = row + a.get(i).toString() + "\n";
>                     }
>                     else {
>                         row = row + a.get(i).toString() + ",";
>                     }
>                 }
>                 // System.out.println(row);
>                 result_csv.write(row);
>             }
>
>             result_csv.close();
>
>             //System.out.println(acReference.toString());
>
>
>         } catch (FileNotFoundException | CDKException e) {
>             System.out.println(e.getMessage());
>         } catch (IOException e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static double cdkCalculateTanimotoCoef(IFingerprinter 
> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) {
>
>         double ret = 0.0;
>
>         try {
>
>             IBitFingerprint fpReference = 
> fingerprinter.getBitFingerprint(acReference);
>
>             //Tanimoto-score
>             IBitFingerprint fpStructure = 
> fingerprinter.getBitFingerprint(acStructure);
>             ret = Tanimoto.calculate(fpReference, fpStructure);
>
>         } catch (Exception ex) {
>             //...
>         }
>
>         return ret;
>     }
> }
>
>
> I hope this code result matches with my goal.
>
> I always thank you all, cdk developers.
>
> Sincerely
> Seopjae Shin
>
>
> On Fri, Mar 26, 2021 at 6:36 PM John Mayfield <[email protected]>
> wrote:
>
>> Do you have a mol2 file or a SMILES file? It's not clear. Mol2 support
>> isn't great in the CDK mainly because it's more a compchem/modelling format
>> than cheminformations which primarily use SMILES or MOLfile.
>>
>> Presume you know how to read line by line from a file here is an example
>> from SMILES:
>>
>> IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
>>> // load from SMILES and compute the ECFP (circular) fingerprint
>>> IFingerprinter fpr = new CircularFingerprinter();
>>> SmilesParser smipar = new SmilesParser(bldr);
>>> List<String> smiles = Arrays.asList("Clc1ccccc1",
>>>         "Fc1ccccc1",
>>>         "Ic1ccccc1",
>>>         "Clc1ncccc1");
>>> List<BitSet> fps = new ArrayList<>();
>>> for (String smi : smiles) {
>>>     IAtomContainer mol = smipar.parseSmiles(smi);
>>>     fps.add(fpr.getBitFingerprint(mol).asBitSet());
>>> }
>>> // print N^2 comparison table
>>> for (int j = 0; j < fps.size(); j++)
>>>     System.out.print("," + smiles.get(j));
>>> System.out.print('\n');
>>> for (int i = 0; i < fps.size(); i++) {
>>>     System.out.print(smiles.get(i));
>>>     for (int j = 0; j < fps.size(); j++) {
>>>         System.out.printf(",%.3f", Tanimoto.calculate(fps.get(i),
>>> fps.get(j)));
>>>     }
>>>     System.out.print('\n');
>>> }
>>
>>
>> ,Clc1ccccc1,Fc1ccccc1,Ic1ccccc1,Clc1ncccc1
>> Clc1ccccc1,1.000,0.368,0.368,0.292
>> Fc1ccccc1,0.368,1.000,0.368,0.192
>> Ic1ccccc1,0.368,0.368,1.000,0.192
>> Clc1ncccc1,0.292,0.192,0.192,1.000
>>
>> There are a lot more optimal ways of doing it and for a large comparison
>> table use ChemFP: https://chemfp.com/.
>>
>> On Wed, 24 Mar 2021 at 06:42, Stesycki, Manuel <
>> [email protected]> wrote:
>>
>>> Good morning,
>>>
>>> Use this class for Tanimoto calucations:
>>>  org.openscience.cdk.similarity.Tanimoto (see doc:
>>> http://cdk.github.io/cdk/latest/docs/api/index.html)
>>>
>>> you could do something like this to calculate your tanimoto score:
>>>
>>> public static double cdkCalculateTanimotoCoef(IFingerprinter
>>> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) {
>>>
>>>         double ret = 0.0;
>>>
>>>         try {
>>>
>>>             IBitFingerprint fpReference =
>>> fingerprinter.getBitFingerprint(acReference);
>>>
>>>             //Tanimoto-score
>>>             IBitFingerprint fpStructure =
>>> fingerprinter.getBitFingerprint(acStructure);
>>>             ret = Tanimoto.calculate(fpReference, fpStructure);
>>>
>>>         } catch (Exception ex) {
>>>             //...
>>>         }
>>>
>>>         return ret;
>>>     }
>>>
>>>
>>>
>>> Viele Grüße,
>>>    Manuel Stesycki
>>>
>>> IT
>>>    0208 / 306-2146
>>>    Physikbau, Büro 117
>>>    [email protected]
>>>
>>> Max-Planck-Institut für Kohlenforschung
>>>    Kaiser-Wilhelm-Platz 1
>>>    D-45470 Mülheim an der Ruhr
>>>    http://www.kofo.mpg.de/de
>>>
>>> Am 24.03.2021 um 04:55 schrieb Sub Jae Shin <[email protected]>:
>>>
>>> To CDK developers.
>>>
>>> Hello, I'm trying to get drug-drug similarity by Tanimoto score.
>>>
>>> I'm a beginner of cdk and java, so I'm stuck in the process of changing
>>> smiles file to Tanimoto score's calculate method's variable.
>>>
>>> package com.company;
>>> import org.openscience.cdk.ChemFile;
>>> import org.openscience.cdk.exception.CDKException;
>>> import org.openscience.cdk.interfaces.IChemFile;
>>> import org.openscience.cdk.io.SMILESReader;
>>> import java.io.*;
>>>
>>> public class Main {
>>>
>>>     public static void main(String[] args) {
>>>         try {
>>>
>>>             InputStream mol2DataStream = new 
>>> FileInputStream("../data/drugbank/structure.smiles");
>>>             SMILESReader reader = new SMILESReader(mol2DataStream);
>>>             IChemFile file = reader.read(new ChemFile());
>>>
>>>         } catch (FileNotFoundException | CDKException e) {
>>>             System.out.println(e.getMessage());
>>>         }
>>>     }
>>> }
>>>
>>> Sincerely
>>> Seopjae Shin.
>>>
>>>
>>> _______________________________________________
>>> Cdk-user mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>>
>>> _______________________________________________
>>> Cdk-user mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>

_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: [Cdk-user] Mol2 file to SMILES

Reply via email to