Hi Egon, I have tried the getFingerprint() method but the outputs are not a series of fixed-length integers too. The output remains to have different lengths of integers, for example, 9-digit vs 10-digit.
Hi John, I have modify the code from ans4 += result0.getHash(k); to ans4 += Integer.toHexString(result0.getHash(k));. Below is my result for Ethanol’s FCFP4: FCFP4 Ethanol: b7bc5856 1 0 2 3 1 2583cb3b 1 31282af8 1 After padding: FCFP4 Ethanol: b7bc5856 1 00000000 2 00000003 1 2583cb3b 1 31282af8 1 Can you please tell me if I understand you correctly in the previous email? Thank you very much to both of you for the attention to this matter. Best Regards, Woon Yee. From: John Mayfield<mailto:john.wilkinson...@gmail.com> Sent: Monday, 25 July, 2022 6:33 PM To: Egon Willighagen<mailto:egon.willigha...@gmail.com> Cc: #NG WOON YEE#<mailto:ngwo0...@e.ntu.edu.sg>; cdk-user@lists.sourceforge.net<mailto:cdk-user@lists.sourceforge.net>; Chong Kim San Allen<mailto:kimsanallen.ch...@ntu.edu.sg> Subject: Re: [Cdk-user] Generation of FCFP Fingerprint Hi Woon Yee, The method is correct, you can emit them at hexadecimal and pad with 0. John On Mon, 25 Jul 2022 at 10:11, Egon Willighagen <egon.willigha...@gmail.com<mailto:egon.willigha...@gmail.com>> wrote: Dear Woon Yee, you can use the getFingerprint() method instead. Egon On Mon, 25 Jul 2022 at 10:59, #NG WOON YEE# via Cdk-user <cdk-user@lists.sourceforge.net<mailto:cdk-user@lists.sourceforge.net>> wrote: Dear Helpdesk, I was using CDK (version 2.7) to generate FCFP4 and 6 for the compound butyramide (https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL1231396.sdf) and ethanol (https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL545.sdf) from their MolFiles which I got from CHEMBL. I was using the following commands in CDK: --------------------------------------------------------------------------------------------------------------- package ecfp; import java.io.*; import com.opencsv.CSVReader; import com.opencsv.CSVReaderBuilder; import com.opencsv.CSVWriter; import com.opencsv.exceptions.CsvException; import java.util.Arrays; import java.util.List; import java.io.FileInputStream; import java.io.IOException; import org.openscience.cdk.exception.CDKException; import org.openscience.cdk.fingerprint.CircularFingerprinter; import org.openscience.cdk.fingerprint.ExtendedFingerprinter; import org.openscience.cdk.fingerprint.ICountFingerprint; import org.openscience.cdk.interfaces.IAtomContainer; import org.openscience.cdk.interfaces.IChemObjectBuilder; import org.openscience.cdk.io.MDLV2000Reader; import org.openscience.cdk.silent.SilentChemObjectBuilder; public class main{ public static void main(String[] args) throws CDKException, IOException { String filename = "C:\\Users\\NGWO0001\\Downloads\\CHEMBL545.sdf.txt"; FileInputStream in = new FileInputStream(filename); MDLV2000Reader reader = new MDLV2000Reader(in); IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); IAtomContainer mol = reader.read(bldr.newAtomContainer()); CircularFingerprinter fingerprinter0 = new CircularFingerprinter( CircularFingerprinter.CLASS_FCFP4 ); System.out.println("FCFP4 Ethanol:"); ICountFingerprint result0 = fingerprinter0.getCountFingerprint(mol); for (int k=0, n = result0.numOfPopulatedbins(); k < n; ++k) { String ans4 = ""; ans4 += result0.getHash(k); ans4 += " " + result0.getCount(k); System.out.printf("%s\n",ans4); } reader.close(); } } --------------------------------------------------------------------------------------------------------------- The results I got were: FCFP4 Butyramide: -1393198889 1 -1212393386 1 -1131767167 2 0 4 2 1 3 1 425233353 1 785469695 1 824716024 1 994111779 1 1429107614 1 FCFP6 Butyramide: -1393198889 1 -1212393386 1 -1131767167 2 0 4 2 1 3 1 425233353 1 785469695 1 824716024 1 994111779 1 1429107614 1 FCFP4 Ethanol: -1212393386 1 0 2 3 1 629394235 1 824716024 1 FCFP6 Ethanol: -1212393386 1 0 2 3 1 629394235 1 824716024 1 I think these results may not be right since I thought that fingerprints are supposed to be a series of hash and so they ought to be a series of fixed-length integers. However, as you see in the results I got, for example, for the FCFP6 for ethanol, one is 10-digits long while others are single digits and 9-digits long. Can you please tell me what I am doing wrong? Thanking you in advance for your assistance and time. Best regards, Woon Yee _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net<mailto:Cdk-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/cdk-user -- ---- Super happy with this new eLife paper describing an Open Science project where we discuss 260 thousand natural products and where they came from, all 700 thousand pairs linked to their primary literature: "The LOTUS initiative for open knowledge management in natural products research", https://doi.org/10.7554/elife.70780 ----- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Twitter/Mastodon: @egonwillighagen<https://twitter.com/egonwillighagen> / @egonw<https://scholar.social/@egonw> Homepage: http://egonw.github.io/ Blog: http://chem-bla-ics.blogspot.com/ PubList: https://www.zotero.org/egonw ORCID: 0000-0001-7542-0286<http://orcid.org/0000-0001-7542-0286> ImpactStory: https://impactstory.org/u/egonwillighagen _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net<mailto:Cdk-user@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/cdk-user
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user