Two points to note:
1. If you break a bond, you should increase the implicit H count of the
attached atoms by the bond order. Otherwise you end up with radicals, as
you've seen.
2. If you copy the substructure instead of fragmenting, then the process
may be simpler as there is an option to adjust the H counts automatically.
See my code at
https://baoilleach.blogspot.co.uk/2018/05/when-all-you-want-is-ring.html
and
https://baoilleach.blogspot.co.uk/2018/05/when-all-you-want-is-ring-part-ii.html

- Noel

On 21 May 2018 at 13:41, Naruki Yoshikawa <naruki.yoshik...@gmail.com>
wrote:

> Hi Noel,
>
> I updated the fragmentation code.
> My code is available at https://github.com/n-yoshik
> awa/contributed/blob/master/c%2B%2B/fragments/obfragment.cpp
>
> I enumerated ten most frequent fragments from our data by using this code.
> The result was as follows:
>
> SMILES  percent
>
> [C]1=CC=[C]C=C1 11.1425
>
> [C]1=CC=CC=C1   7.94308
>
> [C]1=CC=C[C]=C1 4.3803
>
> [C]1=CC=[C][C]=C1       2.76544
>
> [C]1=C[C]=[C]C=C1       2.74526
>
> [CH]1[CH][CH][CH]O1     2.12959
>
> [CH]1[CH][CH][CH]O[CH]1 1.87727
>
> [C]1=CC=CC=[C]1 1.77634
>
> [C]1=CC=C[C]=[C]1       1.31207
>
> [C]1=C[C]=[C][C]=C1     1.18086
>
>
> These fragments have some common parts.
> I want to consolidate these into more common fragments.
>
> As Geoff says:
> > Most of these are benzene or other 6-membered aromatic rings.
> > So 8 of them should consolidate to something like `c1ccccc1` and the
> other two look like 5-membered and 6-membered sugars, which makes sense.
>
> > I think the key problem is that the code is generating radicals (e.g.,
> the [C] pieces.
> > My suggestion would be to take these SMILES fragments, read them in
> again and write out again.
> > But I think Noel has a new way of generating canonical SMILES from
> fragments. I’d suggest posting to the list and asking. Either way would
> consolidate all these strings into c1ccccc1
>
> Do you have any suggestion about generating canonical SMILES from
> fragments or consolidating fragments?
>
> Thanks,
> Naruki
>
> 2018年5月17日(木) 20:45 Noel O'Boyle <baoille...@gmail.com>:
>
>> It was on Github. Here you go: https://github.com/openbabel/o
>> penbabel/pull/1712
>>
>> Are you sure you don't just want the canonical labels? I'm happy to
>> review...
>>
>> On 17 May 2018 at 11:47, Geoffrey Hutchison <geoff.hutchi...@gmail.com>
>> wrote:
>>
>>> Hi Noel,
>>>
>>> I'm working with Naruki, the student with GSoC developing the
>>> fragment-based coordinate generation.
>>>
>>> He's updating my old fragmentation code, which used the SMILES Atom
>>> Order data to canonicalize fragments. I can't find your comments on this,
>>> and I don't remember whether it was in the GitHub tracker or Open Babel
>>> development list.
>>>
>>> Thanks,
>>> -Geoff
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>> _________________________________________
>> OpenBabel-Devel mailing list
>> OpenBabel-Devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to