The OpenBabel toolkit has the ability to generate canonical SMILES of
fragments. It's a key part of the SMILES generator. And it keeps the
context of each atom and writes partial aromaticity. For example, if you
have Oc1ccccc1O, and you specify atoms 1,2, 7, and 8, it will write "OccO".

OBMol *pmol;
(... read a molecule ...)

OBBitVec bv;
bv.Resize(pmol->NumAtoms());
(... set some bits in bv to specify which atoms you want ...)

// Convert the bitmap to a string
std::stringstream bv_ss;
bv_ss.str("");
bv_ss << bv;

// Attach fragment's bitvec string to molecule
OBPairData *fragment_data = new OBPairData;
fragment_data->SetAttribute("SMILES_Fragment");
pmol->SetData(fragment_data);
fragment_data_mol->SetValue(bv_ss.str());

// Create a canonical SMILES of the fragment
OBConversion *conv = new OBConversion();
conv->SetdOutFormats("can")
string smiles = conv->WriteString(pmol, true);


Craig

On Mon, May 21, 2018 at 5:41 AM, Naruki Yoshikawa <
naruki.yoshik...@gmail.com> wrote:

> Hi Noel,
>
> I updated the fragmentation code.
> My code is available at https://github.com/n-yoshik
> awa/contributed/blob/master/c%2B%2B/fragments/obfragment.cpp
>
> I enumerated ten most frequent fragments from our data by using this code.
> The result was as follows:
>
> SMILES  percent
>
> [C]1=CC=[C]C=C1 11.1425
>
> [C]1=CC=CC=C1   7.94308
>
> [C]1=CC=C[C]=C1 4.3803
>
> [C]1=CC=[C][C]=C1       2.76544
>
> [C]1=C[C]=[C]C=C1       2.74526
>
> [CH]1[CH][CH][CH]O1     2.12959
>
> [CH]1[CH][CH][CH]O[CH]1 1.87727
>
> [C]1=CC=CC=[C]1 1.77634
>
> [C]1=CC=C[C]=[C]1       1.31207
>
> [C]1=C[C]=[C][C]=C1     1.18086
>
>
> These fragments have some common parts.
> I want to consolidate these into more common fragments.
>
> As Geoff says:
> > Most of these are benzene or other 6-membered aromatic rings.
> > So 8 of them should consolidate to something like `c1ccccc1` and the
> other two look like 5-membered and 6-membered sugars, which makes sense.
>
> > I think the key problem is that the code is generating radicals (e.g.,
> the [C] pieces.
> > My suggestion would be to take these SMILES fragments, read them in
> again and write out again.
> > But I think Noel has a new way of generating canonical SMILES from
> fragments. I’d suggest posting to the list and asking. Either way would
> consolidate all these strings into c1ccccc1
>
> Do you have any suggestion about generating canonical SMILES from
> fragments or consolidating fragments?
>
> Thanks,
> Naruki
>
> 2018年5月17日(木) 20:45 Noel O'Boyle <baoille...@gmail.com>:
>
>> It was on Github. Here you go: https://github.com/openbabel/o
>> penbabel/pull/1712
>>
>> Are you sure you don't just want the canonical labels? I'm happy to
>> review...
>>
>> On 17 May 2018 at 11:47, Geoffrey Hutchison <geoff.hutchi...@gmail.com>
>> wrote:
>>
>>> Hi Noel,
>>>
>>> I'm working with Naruki, the student with GSoC developing the
>>> fragment-based coordinate generation.
>>>
>>> He's updating my old fragmentation code, which used the SMILES Atom
>>> Order data to canonicalize fragments. I can't find your comments on this,
>>> and I don't remember whether it was in the GitHub tracker or Open Babel
>>> development list.
>>>
>>> Thanks,
>>> -Geoff
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>> _________________________________________
>> OpenBabel-Devel mailing list
>> OpenBabel-Devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> OpenBabel-Devel mailing list
> OpenBabel-Devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>
>


-- 
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to