I managed to tweak the toSmiles code a bit to get the (almost) desired
result. Though the result I get indicates the the SmilesGenerator does not
produce canonical output. I've tried SmiFlavor Unique, Absolute and
Canonical together with Stereo, aromatic etc. The result differs from run
to run, i.e. :
[C]N=[C] vs [C]=N[C]
[C]=C([C])[O] vs [C]C(=[C])[O]
etc. Is this due to the IAtomContainer setup or is this a bug in the
SmilesGenerator?
The CircularFingerprinter.getBitFingerprint().asBitString().toString(); and
Integer.toString(CircularFingerprinter.getFP()); was not really what I
wanted to get.
Best,
Staffan
2017-07-21 14:24 GMT+02:00 John Mayfield <[email protected]>:
> Here's how you can convert the atom indices to a SMILES with stereo,
> 2.1-SNAPSHOT cleans up the stereo API avoids the cast and actually makes
> this a lot easier, done quick and dirty here but you get the idea.
>
> public static String toSmiles(CircularFingerprinter.FP fp, IAtomContainer
> mol) throws CDKException
> {
> IAtomContainer part = mol.getBuilder().newAtomContainer();
> Set<IAtom> aset = new HashSet<>();
> for (int idx : fp.atoms) {
> aset.add(mol.getAtom(idx));
> part.addAtom(mol.getAtom(idx));
> }
> for (IBond bond : mol.bonds()) {
> if (aset.contains(bond.getBegin()) &&
> aset.contains(bond.getEnd()))
> part.addBond(bond);
> }
> for (IStereoElement se : mol.stereoElements()) {
> if (se instanceof ITetrahedralChirality) {
> ITetrahedralChirality tc = (ITetrahedralChirality) se;
> if (aset.contains(tc.getChiralAtom()) &&
> aset.contains(tc.getLigands()[0]) &&
> aset.contains(tc.getLigands()[1]) &&
> aset.contains(tc.getLigands()[2]) &&
> aset.contains(tc.getLigands()[3]))
> part.addStereoElement(tc);
> }
> }
> return SmilesGenerator.isomeric().create(part);
> }
>
>
> On 21 July 2017 at 13:12, John Mayfield <[email protected]>
> wrote:
>
>> Although this produces bit-fingerprints and not any
>>> String-representation of the signatures if I'm reading this correctly?
>>
>>
>> Yes but notice it also gives you the atom indexes, this is much more
>> powerful that just giving the String. We actually have a utility to get the
>> SMARTS for the atoms. Won't give you stereo but it's pretty easy to make it
>> do that if you were so inclined, would be easy to output stereo as SMILES
>> instead of SMARTS:
>>
>>
>>> SmilesParser smipar = new SmilesParser(SilentChemObjectB
>>> uilder.getInstance());
>>> IAtomContainer mol = smipar.parseSmiles("CCCCCC[C@H](C)CO");
>>> CircularFingerprinter fp = new CircularFingerprinter(Circular
>>> Fingerprinter.CLASS_ECFP6);
>>> fp.calculate(mol);
>>>
>> SmartsFragmentExtractor smafrag = new SmartsFragmentExtractor(mol);
>>
>> for (int i = 0; i < fp.getFPCount(); i++)
>>> System.out.println(smafrag.generate(fp.getFP(i).atoms));
>>
>>
>> Result:
>>
>> [CH3v4X4+0]
>>> [CH2v4X4+0]
>>> [CH2v4X4+0]
>>> [CH2v4X4+0]
>>> [CH2v4X4+0]
>>> [CH2v4X4+0]
>>> [CH1v4X4+0]
>>> [CH3v4X4+0]
>>> [CH2v4X4+0]
>>> [OH1v2X2+0]
>>> [CH3v4X4+0][CH2v4X4+0]
>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
>>> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>>> [CH1v4X4+0][CH3v4X4+0]
>>> [CH1v4X4+0][CH2v4X4+0][OH1v2X2+0]
>>> [CH2v4X4+0][OH1v2X2+0]
>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>>> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][
>>> CH2v4X4+0][CH1v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][
>>> CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([
>>> CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[
>>> CH2v4X4+0][OH1v2X2+0]
>>
>>
>>
>> However, I have done some experiments comparing the circular fingerprints
>>> of enantiomers and also diastereomers, and they turn out to have 1.0
>>> tanimoto scores.
>>> What am I doing wrong?
>>
>>
>> Unfortunately the way it was written you currently need 2D coordinates.
>> It's an easy fix if you want to submit the patch, just need to pull the
>> tetrahedral rubric out of the IStereoElements - note the IStereoElement's
>> are created automatically on 2D/3D.
>>
>> SmilesParser smipar = new SmilesParser(SilentChemObjectB
>>> uilder.getInstance());
>>> IAtomContainer mol1 = smipar.parseSmiles("CCCCCC[C@H](C)CO");
>>> IAtomContainer mol2 = smipar.parseSmiles("CCCCCC[C@@H](C)CO");
>>> CircularFingerprinter fp = new CircularFingerprinter(Circular
>>> Fingerprinter.CLASS_ECFP6);
>>> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
>>> fp.getFingerprint(mol2)));
>>> // 1.0
>>> StructureDiagramGenerator sdg = new StructureDiagramGenerator();
>>> sdg.generateCoordinates(mol1);
>>> sdg.generateCoordinates(mol2);
>>> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
>>> fp.getFingerprint(mol2)));
>>> // 0.77
>>
>>
>>
>> On 21 July 2017 at 12:25, Christoph Steinbeck <
>> [email protected]> wrote:
>>
>>> CircularFingerprinter.getBitFingerprint().asBitString().toString();
>>>
>>> or
>>>
>>> Integer.toString(CircularFingerprinter.getFP())
>>>
>>> Did not test this.
>>>
>>> Kind regards,
>>>
>>> Chris
>>>
>>>
>>> —
>>> Prof. Dr. Christoph Steinbeck
>>> Analytical Chemistry - Cheminformatics and Chemometrics
>>> Friedrich-Schiller-University Jena, Germany
>>> Phone Secretariat: +49-3641-948171
>>> http://orcid.org/0000-0001-6966-0814
>>>
>>> What is man but that lofty spirit - that sense of enterprise.
>>> ... Kirk, "I, Mudd," stardate 4513.3..
>>>
>>> > On 21 Jul 02017, at 13:09, Staffan Arvidsson <
>>> [email protected]> wrote:
>>> >
>>> > OK thanks! Although this produces bit-fingerprints and not any
>>> String-representation of the signatures if I'm reading this correctly?
>>> Currently all our code requires the Signatures to be Strings. Would require
>>> a large rewrite to get this to work for us. Because the javadoc says that
>>> method getRawFingerprint is not correct so I should not use it? (Even
>>> though this would be something more like what we want)
>>> >
>>> > Best,
>>> > Staffan
>>> >
>>> > 2017-07-21 11:59 GMT+02:00 John Mayfield <[email protected]
>>> >:
>>> > Yes,
>>> >
>>> > Use the CircularFingerprinter, it encodes stereochemistry, the
>>> relevant method is CircularFingerprinter.getFP() which will give you the
>>> atoms involved and the hashed value. IIRC the first atom in the list is the
>>> 'root'.
>>> >
>>> > John
>>> >
>>> > On 21 July 2017 at 09:39, Staffan Arvidsson <
>>> [email protected]> wrote:
>>> > Hi all,
>>> >
>>> > I wonder if there is any way of producing atom signatures with
>>> stereoinformation? Currently we're using
>>> >
>>> > String signature = new AtomSignature(atom, height,
>>> molecule).toCanonicalString();
>>> >
>>> > to produce the signatures.
>>> >
>>> >
>>> > Best,
>>> > Staffan
>>> >
>>> > ------------------------------------------------------------
>>> ------------------
>>> > Check out the vibrant tech community on one of the world's most
>>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> > _______________________________________________
>>> > Cdk-user mailing list
>>> > [email protected]
>>> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>>> >
>>> >
>>> >
>>> > ------------------------------------------------------------
>>> ------------------
>>> > Check out the vibrant tech community on one of the world's most
>>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>>> _________________________________________
>>> > Cdk-user mailing list
>>> > [email protected]
>>> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Cdk-user mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user