Thanks for the tip! On Sun, Mar 28, 2021 at 8:41 PM John Mayfield <john.wilkinson...@gmail.com> wrote:
> You should use the *CircularFingerprinter* for similarity. > > On Sun, 28 Mar 2021 at 08:39, Sub Jae Shin <cnb.mons...@gmail.com> wrote: > >> To John Mayfield >> >> Hi, I found the drugbank id property from AtomContainer's getproperties >> method, so that I could specify which atom container indicates which drug. >> >> I think my goal to get drug-drug similarity has been achieved in my guess. >> >> package com.company; >> import org.openscience.cdk.ChemFile; >> import org.openscience.cdk.exception.CDKException; >> import org.openscience.cdk.fingerprint.Fingerprinter; >> import org.openscience.cdk.fingerprint.IBitFingerprint; >> import org.openscience.cdk.fingerprint.IFingerprinter; >> import org.openscience.cdk.graph.rebond.Bspt; >> import org.openscience.cdk.interfaces.IAtomContainer; >> import org.openscience.cdk.interfaces.IChemFile; >> import org.openscience.cdk.io.MDLV2000Reader; >> import org.openscience.cdk.similarity.Tanimoto; >> import org.openscience.cdk.tools.manipulator.ChemFileManipulator; >> >> import java.io.*; >> import java.lang.reflect.Array; >> import java.util.ArrayList; >> import java.util.List; >> import java.util.Map; >> >> public class Main { >> >> public static void main(String[] args) { >> try { >> >> InputStream structures = new >> FileInputStream("../data/drugbank/structures.sdf"); >> MDLV2000Reader reader = new MDLV2000Reader(structures); >> IChemFile file = reader.read(new ChemFile()); >> //Where can I find drugbank id? >> >> Fingerprinter finger = new Fingerprinter(); >> List<IAtomContainer> AtomData = >> ChemFileManipulator.getAllAtomContainers(file); >> int count = AtomData.size(); >> ArrayList<ArrayList> df = new ArrayList<>(); >> >> for(int i = 0; i < count; ++i) { >> ArrayList<Object> list = new ArrayList<>(); >> IAtomContainer acReference = AtomData.get(i); >> Map refProperties = acReference.getProperties(); >> list.add(refProperties.get("DATABASE_ID")); >> for(int j = 0; j < count; ++j) { >> IAtomContainer acStructure = AtomData.get(j); >> Map structProperties = acStructure.getProperties(); >> System.out.println("REF DATABASE_ID : " + >> refProperties.get("DATABASE_ID") + >> "-" + "COMP DATABASE_ID" + >> structProperties.get("DATABASE_ID") + " similarity is now calculating...."); >> double similarity = cdkCalculateTanimotoCoef(finger, >> acReference, acStructure); >> list.add(similarity); >> } >> df.add(list); >> } >> FileWriter result_csv = new >> FileWriter("../data/drugbank/drug_drug_sim.csv"); >> >> for(ArrayList a : df){ >> String row = ""; >> for(int i = 0; i < a.size(); ++i) { >> if(i == a.size() - 1) { >> row = row + a.get(i).toString() + "\n"; >> } >> else { >> row = row + a.get(i).toString() + ","; >> } >> } >> // System.out.println(row); >> result_csv.write(row); >> } >> >> result_csv.close(); >> >> //System.out.println(acReference.toString()); >> >> >> } catch (FileNotFoundException | CDKException e) { >> System.out.println(e.getMessage()); >> } catch (IOException e) { >> e.printStackTrace(); >> } >> } >> >> public static double cdkCalculateTanimotoCoef(IFingerprinter >> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) { >> >> double ret = 0.0; >> >> try { >> >> IBitFingerprint fpReference = >> fingerprinter.getBitFingerprint(acReference); >> >> //Tanimoto-score >> IBitFingerprint fpStructure = >> fingerprinter.getBitFingerprint(acStructure); >> ret = Tanimoto.calculate(fpReference, fpStructure); >> >> } catch (Exception ex) { >> //... >> } >> >> return ret; >> } >> } >> >> >> I hope this code result matches with my goal. >> >> I always thank you all, cdk developers. >> >> Sincerely >> Seopjae Shin >> >> >> On Fri, Mar 26, 2021 at 6:36 PM John Mayfield < >> john.wilkinson...@gmail.com> wrote: >> >>> Do you have a mol2 file or a SMILES file? It's not clear. Mol2 support >>> isn't great in the CDK mainly because it's more a compchem/modelling format >>> than cheminformations which primarily use SMILES or MOLfile. >>> >>> Presume you know how to read line by line from a file here is an example >>> from SMILES: >>> >>> IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); >>>> // load from SMILES and compute the ECFP (circular) fingerprint >>>> IFingerprinter fpr = new CircularFingerprinter(); >>>> SmilesParser smipar = new SmilesParser(bldr); >>>> List<String> smiles = Arrays.asList("Clc1ccccc1", >>>> "Fc1ccccc1", >>>> "Ic1ccccc1", >>>> "Clc1ncccc1"); >>>> List<BitSet> fps = new ArrayList<>(); >>>> for (String smi : smiles) { >>>> IAtomContainer mol = smipar.parseSmiles(smi); >>>> fps.add(fpr.getBitFingerprint(mol).asBitSet()); >>>> } >>>> // print N^2 comparison table >>>> for (int j = 0; j < fps.size(); j++) >>>> System.out.print("," + smiles.get(j)); >>>> System.out.print('\n'); >>>> for (int i = 0; i < fps.size(); i++) { >>>> System.out.print(smiles.get(i)); >>>> for (int j = 0; j < fps.size(); j++) { >>>> System.out.printf(",%.3f", Tanimoto.calculate(fps.get(i), >>>> fps.get(j))); >>>> } >>>> System.out.print('\n'); >>>> } >>> >>> >>> ,Clc1ccccc1,Fc1ccccc1,Ic1ccccc1,Clc1ncccc1 >>> Clc1ccccc1,1.000,0.368,0.368,0.292 >>> Fc1ccccc1,0.368,1.000,0.368,0.192 >>> Ic1ccccc1,0.368,0.368,1.000,0.192 >>> Clc1ncccc1,0.292,0.192,0.192,1.000 >>> >>> There are a lot more optimal ways of doing it and for a large comparison >>> table use ChemFP: https://chemfp.com/. >>> >>> On Wed, 24 Mar 2021 at 06:42, Stesycki, Manuel < >>> stesy...@mpi-muelheim.mpg.de> wrote: >>> >>>> Good morning, >>>> >>>> Use this class for Tanimoto calucations: >>>> org.openscience.cdk.similarity.Tanimoto (see doc: >>>> http://cdk.github.io/cdk/latest/docs/api/index.html) >>>> >>>> you could do something like this to calculate your tanimoto score: >>>> >>>> public static double cdkCalculateTanimotoCoef(IFingerprinter >>>> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) { >>>> >>>> double ret = 0.0; >>>> >>>> try { >>>> >>>> IBitFingerprint fpReference = >>>> fingerprinter.getBitFingerprint(acReference); >>>> >>>> //Tanimoto-score >>>> IBitFingerprint fpStructure = >>>> fingerprinter.getBitFingerprint(acStructure); >>>> ret = Tanimoto.calculate(fpReference, fpStructure); >>>> >>>> } catch (Exception ex) { >>>> //... >>>> } >>>> >>>> return ret; >>>> } >>>> >>>> >>>> >>>> Viele Grüße, >>>> Manuel Stesycki >>>> >>>> IT >>>> 0208 / 306-2146 >>>> Physikbau, Büro 117 >>>> stesy...@mpi-muelheim.mpg.de >>>> >>>> Max-Planck-Institut für Kohlenforschung >>>> Kaiser-Wilhelm-Platz 1 >>>> D-45470 Mülheim an der Ruhr >>>> http://www.kofo.mpg.de/de >>>> >>>> Am 24.03.2021 um 04:55 schrieb Sub Jae Shin <cnb.mons...@gmail.com>: >>>> >>>> To CDK developers. >>>> >>>> Hello, I'm trying to get drug-drug similarity by Tanimoto score. >>>> >>>> I'm a beginner of cdk and java, so I'm stuck in the process of changing >>>> smiles file to Tanimoto score's calculate method's variable. >>>> >>>> package com.company; >>>> import org.openscience.cdk.ChemFile; >>>> import org.openscience.cdk.exception.CDKException; >>>> import org.openscience.cdk.interfaces.IChemFile; >>>> import org.openscience.cdk.io.SMILESReader; >>>> import java.io.*; >>>> >>>> public class Main { >>>> >>>> public static void main(String[] args) { >>>> try { >>>> >>>> InputStream mol2DataStream = new >>>> FileInputStream("../data/drugbank/structure.smiles"); >>>> SMILESReader reader = new SMILESReader(mol2DataStream); >>>> IChemFile file = reader.read(new ChemFile()); >>>> >>>> } catch (FileNotFoundException | CDKException e) { >>>> System.out.println(e.getMessage()); >>>> } >>>> } >>>> } >>>> >>>> Sincerely >>>> Seopjae Shin. >>>> >>>> >>>> _______________________________________________ >>>> Cdk-user mailing list >>>> Cdk-user@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/cdk-user >>>> >>>> >>>> _______________________________________________ >>>> Cdk-user mailing list >>>> Cdk-user@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/cdk-user >>>> >>>
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user