Your radius and bitvector lengths are too small for such a big training
set. You probably have bit collisions or the radius is not enough to
capture the differences in substructures, that's why you see that artifact.
Try radius 3, bitvector length 4096. I think that you have enough training
samples to go up to bitvector length 8192 without overfitting the networks,
although that will make the training much slower.

On Wed, 10 Oct 2018 at 14:15, Michal Krompiec <michal.kromp...@gmail.com>
wrote:

> Hi Thomas,
> Radius 2, 2048 bits, 5200 data points.
>
> On Wed, 10 Oct 2018 at 13:13, Thomas Evangelidis <teva...@gmail.com>
> wrote:
>
>> What's your bitvector length and radius? How many training samples do you
>> have?
>>
>> On Wed, 10 Oct 2018 at 13:51, Michal Krompiec <michal.kromp...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>> I have a slightly off-topic question. I'm trying to train a neural
>>> network on a dataset of small molecules and their melting points. I did get
>>> a not-so-bad accuracy with Morgan fingerprints, but I've realised that
>>> regardless of FP radius and bitvector length, several dozen molecules have
>>> the same fingerprints but wildly different melting points. I am pretty sure
>>> this is a "solved problem" so I don't want to reinvent the wheel. What is
>>> the recommended/usual way of dealing with this?
>>> Thanks,
>>> Michal
>>>
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> --
>>
>> ======================================================================
>>
>> Dr Thomas Evangelidis
>>
>> Research Scientist
>>
>> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech
>> Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>
>> Prague, Czech Republic
>>   &
>> CEITEC - Central European Institute of Technology
>> <https://www.ceitec.eu/>
>> Brno, Czech Republic
>>
>> email: teva...@gmail.com
>>
>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>
>>
>>

-- 

======================================================================

Dr Thomas Evangelidis

Research Scientist

IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy
of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>
Prague, Czech Republic
  &
CEITEC - Central European Institute of Technology <https://www.ceitec.eu/>
Brno, Czech Republic

email: teva...@gmail.com

website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to