In this case you should use the hash-code for de-duplication, not canonical
SMILES. Anyways I presume you'll still probably go with the canonical
SMILES route so...

The radicals are causing odd side effects, it is a bug basically we don't
use bond orders to canonicalise. I've been planning on redoing the entire
canonical labelling stack but not sure if that will be part of CDK proper.
What's more interesting is the absolute SMILES (InChI driven) also has the
problem. A quick check on OpenBabel (using the same InChI procedure) shows
they get the same.... it's not really a flaw in the InChI because the InChI
doesn't care about bond orders... you don't see this effect.

A work around is to update your valences to be correct, in the toSmiles you
should adjust the hydrogen count. The bad way of doing this is with atom
typing, the smart way is to do it as you fragment the molecule. Make sure
you "rollback" the hydrogen counts to avoid accumulating more and more
hydrogens. Out of interest what did you change in your toSmiles function?,
the traversal I wrote is optimal for the way CDK stores it's molecules
(e.g. avoiding mol.getConnectedBondsList()).

Updated toSmiles here:
https://gist.github.com/johnmay/56c1238eb335c7f8e4ce8d3d399cef40

Open Babel has the same problem:

obabel -:'[C]N=[C]' -osmi -xU
> ==============================
> *** Open Babel Warning  in InChI code
>   #0 :Accepted unusual valence(s): C(1)
> [C]N=[C]
> 1 molecule converted
> [sovereign ~]: obabel -:'[C]=N[C]' -osmi -xU
> ==============================
> *** Open Babel Warning  in InChI code
>   #0 :Accepted unusual valence(s): C(1)
> [C]=N[C]
> 1 molecule converted


On 25 July 2017 at 15:43, Staffan Arvidsson <staffan.arvids...@gmail.com>
wrote:

> I managed to tweak the toSmiles code a bit to get the (almost) desired
> result. Though the result I get indicates the the SmilesGenerator does not
> produce canonical output. I've tried SmiFlavor Unique, Absolute and
> Canonical together with Stereo, aromatic etc. The result differs from run
> to run, i.e. :
>
> [C]N=[C] vs [C]=N[C]
> [C]=C([C])[O] vs [C]C(=[C])[O]
>
> etc. Is this due to the IAtomContainer setup or is this a bug in the
> SmilesGenerator?
>
> The CircularFingerprinter.getBitFingerprint().asBitString().toString();
> and Integer.toString(CircularFingerprinter.getFP()); was not really what
> I wanted to get.
>
> Best,
> Staffan
>
> 2017-07-21 14:24 GMT+02:00 John Mayfield <john.wilkinson...@gmail.com>:
>
>> Here's how you can convert the atom indices to a SMILES with stereo,
>> 2.1-SNAPSHOT cleans up the stereo API avoids the cast and actually makes
>> this a lot easier, done quick and dirty here but you get the idea.
>>
>> public static String toSmiles(CircularFingerprinter.FP fp, IAtomContainer 
>> mol) throws CDKException
>> {
>>   IAtomContainer part = mol.getBuilder().newAtomContainer();
>>   Set<IAtom>     aset = new HashSet<>();
>>   for (int idx : fp.atoms) {
>>     aset.add(mol.getAtom(idx));
>>     part.addAtom(mol.getAtom(idx));
>>   }
>>   for (IBond bond : mol.bonds()) {
>>     if (aset.contains(bond.getBegin()) &&
>>         aset.contains(bond.getEnd()))
>>       part.addBond(bond);
>>   }
>>   for (IStereoElement se : mol.stereoElements()) {
>>     if (se instanceof ITetrahedralChirality) {
>>       ITetrahedralChirality tc = (ITetrahedralChirality) se;
>>       if (aset.contains(tc.getChiralAtom()) &&
>>           aset.contains(tc.getLigands()[0]) &&
>>           aset.contains(tc.getLigands()[1]) &&
>>           aset.contains(tc.getLigands()[2]) &&
>>           aset.contains(tc.getLigands()[3]))
>>         part.addStereoElement(tc);
>>     }
>>   }
>>   return SmilesGenerator.isomeric().create(part);
>> }
>>
>>
>> On 21 July 2017 at 13:12, John Mayfield <john.wilkinson...@gmail.com>
>> wrote:
>>
>>>  Although this produces bit-fingerprints and not any
>>>> String-representation of the signatures if I'm reading this correctly?
>>>
>>>
>>> Yes but notice it also gives you the atom indexes, this is much more
>>> powerful that just giving the String. We actually have a utility to get the
>>> SMARTS for the atoms. Won't give you stereo but it's pretty easy to make it
>>> do that if you were so inclined, would be easy to output stereo as SMILES
>>> instead of SMARTS:
>>>
>>>
>>>> SmilesParser   smipar = new SmilesParser(SilentChemObjectB
>>>> uilder.getInstance());
>>>> IAtomContainer mol = smipar.parseSmiles("CCCCCC[C@H](C)CO");
>>>> CircularFingerprinter fp = new CircularFingerprinter(Circular
>>>> Fingerprinter.CLASS_ECFP6);
>>>> fp.calculate(mol);
>>>>
>>> SmartsFragmentExtractor smafrag = new SmartsFragmentExtractor(mol);
>>>
>>> for (int i = 0; i < fp.getFPCount(); i++)
>>>>   System.out.println(smafrag.generate(fp.getFP(i).atoms));
>>>
>>>
>>> Result:
>>>
>>> [CH3v4X4+0]
>>>> [CH2v4X4+0]
>>>> [CH2v4X4+0]
>>>> [CH2v4X4+0]
>>>> [CH2v4X4+0]
>>>> [CH2v4X4+0]
>>>> [CH1v4X4+0]
>>>> [CH3v4X4+0]
>>>> [CH2v4X4+0]
>>>> [OH1v2X2+0]
>>>> [CH3v4X4+0][CH2v4X4+0]
>>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
>>>> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>>>> [CH1v4X4+0][CH3v4X4+0]
>>>> [CH1v4X4+0][CH2v4X4+0][OH1v2X2+0]
>>>> [CH2v4X4+0][OH1v2X2+0]
>>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>>>> [CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0]
>>>> [CH3v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v
>>>> 4X4+0][CH1v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v
>>>> 4X4+0]([CH3v4X4+0])[CH2v4X4+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3
>>>> v4X4+0])[CH2v4X4+0][OH1v2X2+0]
>>>> [CH2v4X4+0][CH2v4X4+0][CH2v4X4+0][CH1v4X4+0]([CH3v4X4+0])[CH
>>>> 2v4X4+0][OH1v2X2+0]
>>>
>>>
>>>
>>> However, I have done some experiments comparing the circular
>>>> fingerprints of enantiomers and also diastereomers, and they turn out to
>>>> have 1.0 tanimoto scores.
>>>> What am I doing wrong?
>>>
>>>
>>> Unfortunately the way it was written you currently need 2D coordinates.
>>> It's an easy fix if you want to submit the patch, just need to pull the
>>> tetrahedral rubric out of the IStereoElements - note the IStereoElement's
>>> are created automatically on 2D/3D.
>>>
>>> SmilesParser          smipar = new SmilesParser(SilentChemObjectB
>>>> uilder.getInstance());
>>>> IAtomContainer        mol1 = smipar.parseSmiles("CCCCCC[C@H](C)CO");
>>>> IAtomContainer        mol2 = smipar.parseSmiles("CCCCCC[C@@H](C)CO");
>>>> CircularFingerprinter fp = new CircularFingerprinter(Circular
>>>> Fingerprinter.CLASS_ECFP6);
>>>> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
>>>> fp.getFingerprint(mol2)));
>>>> // 1.0
>>>> StructureDiagramGenerator sdg = new StructureDiagramGenerator();
>>>> sdg.generateCoordinates(mol1);
>>>> sdg.generateCoordinates(mol2);
>>>> System.out.println(Tanimoto.calculate(fp.getFingerprint(mol1),
>>>> fp.getFingerprint(mol2)));
>>>> // 0.77
>>>
>>>
>>>
>>> On 21 July 2017 at 12:25, Christoph Steinbeck <
>>> christoph.steinb...@uni-jena.de> wrote:
>>>
>>>> CircularFingerprinter.getBitFingerprint().asBitString().toString();
>>>>
>>>> or
>>>>
>>>> Integer.toString(CircularFingerprinter.getFP())
>>>>
>>>> Did not test this.
>>>>
>>>> Kind regards,
>>>>
>>>> Chris
>>>>
>>>>
>>>> —
>>>> Prof. Dr. Christoph Steinbeck
>>>> Analytical Chemistry - Cheminformatics and Chemometrics
>>>> Friedrich-Schiller-University Jena, Germany
>>>> Phone Secretariat: +49-3641-948171
>>>> http://orcid.org/0000-0001-6966-0814
>>>>
>>>> What is man but that lofty spirit - that sense of enterprise.
>>>> ... Kirk, "I, Mudd," stardate 4513.3..
>>>>
>>>> > On 21 Jul 02017, at 13:09, Staffan Arvidsson <
>>>> staffan.arvids...@gmail.com> wrote:
>>>> >
>>>> > OK thanks! Although this produces bit-fingerprints and not any
>>>> String-representation of the signatures if I'm reading this correctly?
>>>> Currently all our code requires the Signatures to be Strings. Would require
>>>> a large rewrite to get this to work for us. Because the javadoc says that
>>>> method getRawFingerprint is not correct so I should not use it? (Even
>>>> though this would be something more like what we want)
>>>> >
>>>> > Best,
>>>> > Staffan
>>>> >
>>>> > 2017-07-21 11:59 GMT+02:00 John Mayfield <john.wilkinson...@gmail.com
>>>> >:
>>>> > Yes,
>>>> >
>>>> > Use the CircularFingerprinter, it encodes stereochemistry, the
>>>> relevant method is CircularFingerprinter.getFP() which will give you the
>>>> atoms involved and the hashed value. IIRC the first atom in the list is the
>>>> 'root'.
>>>> >
>>>> > John
>>>> >
>>>> > On 21 July 2017 at 09:39, Staffan Arvidsson <
>>>> staffan.arvids...@gmail.com> wrote:
>>>> > Hi all,
>>>> >
>>>> > I wonder if there is any way of producing atom signatures with
>>>> stereoinformation? Currently we're using
>>>> >
>>>> > String signature = new AtomSignature(atom, height,
>>>> molecule).toCanonicalString();
>>>> >
>>>> > to produce the signatures.
>>>> >
>>>> >
>>>> > Best,
>>>> > Staffan
>>>> >
>>>> > ------------------------------------------------------------
>>>> ------------------
>>>> > Check out the vibrant tech community on one of the world's most
>>>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> > _______________________________________________
>>>> > Cdk-user mailing list
>>>> > Cdk-user@lists.sourceforge.net
>>>> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>> >
>>>> >
>>>> >
>>>> > ------------------------------------------------------------
>>>> ------------------
>>>> > Check out the vibrant tech community on one of the world's most
>>>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>>>> _________________________________________
>>>> > Cdk-user mailing list
>>>> > Cdk-user@lists.sourceforge.net
>>>> > https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Cdk-user mailing list
>>>> Cdk-user@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>
>>>
>>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to