You should use the *CircularFingerprinter* for similarity.
On Sun, 28 Mar 2021 at 08:39, Sub Jae Shin <[email protected]> wrote:
> To John Mayfield
>
> Hi, I found the drugbank id property from AtomContainer's getproperties
> method, so that I could specify which atom container indicates which drug.
>
> I think my goal to get drug-drug similarity has been achieved in my guess.
>
> package com.company;
> import org.openscience.cdk.ChemFile;
> import org.openscience.cdk.exception.CDKException;
> import org.openscience.cdk.fingerprint.Fingerprinter;
> import org.openscience.cdk.fingerprint.IBitFingerprint;
> import org.openscience.cdk.fingerprint.IFingerprinter;
> import org.openscience.cdk.graph.rebond.Bspt;
> import org.openscience.cdk.interfaces.IAtomContainer;
> import org.openscience.cdk.interfaces.IChemFile;
> import org.openscience.cdk.io.MDLV2000Reader;
> import org.openscience.cdk.similarity.Tanimoto;
> import org.openscience.cdk.tools.manipulator.ChemFileManipulator;
>
> import java.io.*;
> import java.lang.reflect.Array;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Map;
>
> public class Main {
>
> public static void main(String[] args) {
> try {
>
> InputStream structures = new
> FileInputStream("../data/drugbank/structures.sdf");
> MDLV2000Reader reader = new MDLV2000Reader(structures);
> IChemFile file = reader.read(new ChemFile());
> //Where can I find drugbank id?
>
> Fingerprinter finger = new Fingerprinter();
> List<IAtomContainer> AtomData =
> ChemFileManipulator.getAllAtomContainers(file);
> int count = AtomData.size();
> ArrayList<ArrayList> df = new ArrayList<>();
>
> for(int i = 0; i < count; ++i) {
> ArrayList<Object> list = new ArrayList<>();
> IAtomContainer acReference = AtomData.get(i);
> Map refProperties = acReference.getProperties();
> list.add(refProperties.get("DATABASE_ID"));
> for(int j = 0; j < count; ++j) {
> IAtomContainer acStructure = AtomData.get(j);
> Map structProperties = acStructure.getProperties();
> System.out.println("REF DATABASE_ID : " +
> refProperties.get("DATABASE_ID") +
> "-" + "COMP DATABASE_ID" +
> structProperties.get("DATABASE_ID") + " similarity is now calculating....");
> double similarity = cdkCalculateTanimotoCoef(finger,
> acReference, acStructure);
> list.add(similarity);
> }
> df.add(list);
> }
> FileWriter result_csv = new
> FileWriter("../data/drugbank/drug_drug_sim.csv");
>
> for(ArrayList a : df){
> String row = "";
> for(int i = 0; i < a.size(); ++i) {
> if(i == a.size() - 1) {
> row = row + a.get(i).toString() + "\n";
> }
> else {
> row = row + a.get(i).toString() + ",";
> }
> }
> // System.out.println(row);
> result_csv.write(row);
> }
>
> result_csv.close();
>
> //System.out.println(acReference.toString());
>
>
> } catch (FileNotFoundException | CDKException e) {
> System.out.println(e.getMessage());
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
>
> public static double cdkCalculateTanimotoCoef(IFingerprinter
> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) {
>
> double ret = 0.0;
>
> try {
>
> IBitFingerprint fpReference =
> fingerprinter.getBitFingerprint(acReference);
>
> //Tanimoto-score
> IBitFingerprint fpStructure =
> fingerprinter.getBitFingerprint(acStructure);
> ret = Tanimoto.calculate(fpReference, fpStructure);
>
> } catch (Exception ex) {
> //...
> }
>
> return ret;
> }
> }
>
>
> I hope this code result matches with my goal.
>
> I always thank you all, cdk developers.
>
> Sincerely
> Seopjae Shin
>
>
> On Fri, Mar 26, 2021 at 6:36 PM John Mayfield <[email protected]>
> wrote:
>
>> Do you have a mol2 file or a SMILES file? It's not clear. Mol2 support
>> isn't great in the CDK mainly because it's more a compchem/modelling format
>> than cheminformations which primarily use SMILES or MOLfile.
>>
>> Presume you know how to read line by line from a file here is an example
>> from SMILES:
>>
>> IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
>>> // load from SMILES and compute the ECFP (circular) fingerprint
>>> IFingerprinter fpr = new CircularFingerprinter();
>>> SmilesParser smipar = new SmilesParser(bldr);
>>> List<String> smiles = Arrays.asList("Clc1ccccc1",
>>> "Fc1ccccc1",
>>> "Ic1ccccc1",
>>> "Clc1ncccc1");
>>> List<BitSet> fps = new ArrayList<>();
>>> for (String smi : smiles) {
>>> IAtomContainer mol = smipar.parseSmiles(smi);
>>> fps.add(fpr.getBitFingerprint(mol).asBitSet());
>>> }
>>> // print N^2 comparison table
>>> for (int j = 0; j < fps.size(); j++)
>>> System.out.print("," + smiles.get(j));
>>> System.out.print('\n');
>>> for (int i = 0; i < fps.size(); i++) {
>>> System.out.print(smiles.get(i));
>>> for (int j = 0; j < fps.size(); j++) {
>>> System.out.printf(",%.3f", Tanimoto.calculate(fps.get(i),
>>> fps.get(j)));
>>> }
>>> System.out.print('\n');
>>> }
>>
>>
>> ,Clc1ccccc1,Fc1ccccc1,Ic1ccccc1,Clc1ncccc1
>> Clc1ccccc1,1.000,0.368,0.368,0.292
>> Fc1ccccc1,0.368,1.000,0.368,0.192
>> Ic1ccccc1,0.368,0.368,1.000,0.192
>> Clc1ncccc1,0.292,0.192,0.192,1.000
>>
>> There are a lot more optimal ways of doing it and for a large comparison
>> table use ChemFP: https://chemfp.com/.
>>
>> On Wed, 24 Mar 2021 at 06:42, Stesycki, Manuel <
>> [email protected]> wrote:
>>
>>> Good morning,
>>>
>>> Use this class for Tanimoto calucations:
>>> org.openscience.cdk.similarity.Tanimoto (see doc:
>>> http://cdk.github.io/cdk/latest/docs/api/index.html)
>>>
>>> you could do something like this to calculate your tanimoto score:
>>>
>>> public static double cdkCalculateTanimotoCoef(IFingerprinter
>>> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) {
>>>
>>> double ret = 0.0;
>>>
>>> try {
>>>
>>> IBitFingerprint fpReference =
>>> fingerprinter.getBitFingerprint(acReference);
>>>
>>> //Tanimoto-score
>>> IBitFingerprint fpStructure =
>>> fingerprinter.getBitFingerprint(acStructure);
>>> ret = Tanimoto.calculate(fpReference, fpStructure);
>>>
>>> } catch (Exception ex) {
>>> //...
>>> }
>>>
>>> return ret;
>>> }
>>>
>>>
>>>
>>> Viele Grüße,
>>> Manuel Stesycki
>>>
>>> IT
>>> 0208 / 306-2146
>>> Physikbau, Büro 117
>>> [email protected]
>>>
>>> Max-Planck-Institut für Kohlenforschung
>>> Kaiser-Wilhelm-Platz 1
>>> D-45470 Mülheim an der Ruhr
>>> http://www.kofo.mpg.de/de
>>>
>>> Am 24.03.2021 um 04:55 schrieb Sub Jae Shin <[email protected]>:
>>>
>>> To CDK developers.
>>>
>>> Hello, I'm trying to get drug-drug similarity by Tanimoto score.
>>>
>>> I'm a beginner of cdk and java, so I'm stuck in the process of changing
>>> smiles file to Tanimoto score's calculate method's variable.
>>>
>>> package com.company;
>>> import org.openscience.cdk.ChemFile;
>>> import org.openscience.cdk.exception.CDKException;
>>> import org.openscience.cdk.interfaces.IChemFile;
>>> import org.openscience.cdk.io.SMILESReader;
>>> import java.io.*;
>>>
>>> public class Main {
>>>
>>> public static void main(String[] args) {
>>> try {
>>>
>>> InputStream mol2DataStream = new
>>> FileInputStream("../data/drugbank/structure.smiles");
>>> SMILESReader reader = new SMILESReader(mol2DataStream);
>>> IChemFile file = reader.read(new ChemFile());
>>>
>>> } catch (FileNotFoundException | CDKException e) {
>>> System.out.println(e.getMessage());
>>> }
>>> }
>>> }
>>>
>>> Sincerely
>>> Seopjae Shin.
>>>
>>>
>>> _______________________________________________
>>> Cdk-user mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>>
>>> _______________________________________________
>>> Cdk-user mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user