Re: [Rdkit-discuss] Generating Fingerprints from Smiles or Mol

Jason Ochoada Fri, 18 Jan 2019 08:56:07 -0800

Awesome!  Thanks for all the help!

Jason


On Sat, Jan 5, 2019, 1:21 AM Greg Landrum <greg.land...@gmail.com> wrote:

>
>
> On Fri, Jan 4, 2019 at 1:59 PM Jason Ochoada <jocho...@gmail.com> wrote:
>
>>
>> Thanks so much for taking the time to help!  I didn't realize the size
>> limit recommendation for pandas so maybe that's why I don't see much of it.
>>
>
> Yeah, Pandas is designed to keep the entire dataframe in memory. This
> makes things tricky with large datasets.
> For similarity searches across large sets, chemfp is really a great way to
> go (and it's even better if you license the commercial version).
>
>
>>   I often work on much larger scale and was investigating moving from
>> KNIME to RDKit on Linux for that reason.  The curve is just steep right now
>> :) learning python, pandas, RDKit etc. all at once!  I'll start
>> digging/searching for the more traditional straight python ways to do the
>> same.
>>
>
> Yeah, there's a lot there to pick up, but it sounds like you're making a
> great start... good luck with it and please do keep asking questions as you
> encounter problems!
>
> -greg
>
>
>
>> Thanks again for the info and help!
>> Jason
>> St. Jude Children's Research Hospital
>>
>> On Fri, Jan 4, 2019 at 1:34 AM Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> Hi Jason,
>>>
>>> This gist shows how to generate fingerprints for the molecules in a
>>> pandas dataframe and then use them to do similarity searches:
>>> https://gist.github.com/greglandrum/045ccf8009fde91fc985864e70ee72a1
>>>
>>> This is a reasonably efficient way of working with a smallish (<10K)
>>> number of molecules.
>>>
>>> -greg
>>>
>>>
>>> On Thu, Jan 3, 2019 at 7:10 PM Jason Ochoada <jocho...@gmail.com> wrote:
>>>
>>>> Hi Everyone!
>>>>
>>>> I'm a newbie making the shift from RDKit in KNIME to working with the
>>>> full package.  I have been working (hacking) my through the tutorials I
>>>> could find pandas, Jupyter, RDKit etc.  I'm using RDKit in the anaconda 3
>>>> environment.  I'm struggling to figure out how to do what I imagine is a
>>>> very simple task.  I have read in a flat file (Smiles file) and have it in
>>>> a pandas data frame named cpds.  It contained SMILES and ID.  I have been
>>>> able to add a molecule to the dataframe:
>>>>
>>>>
>>>> PandasTools.AddMoleculeColumnToFrame(cpds,'SMILES','Molecule',includeFingerprints=False)
>>>> print([str(x) for x in cpds.columns])
>>>>
>>>> But I can't seem to figure out how to create and append a fingerprint.
>>>> I'm open to any options as I'm new and don't have any particular structure
>>>> I like to work in.  Of course once I have this I'd like to do similarity
>>>> searches either in RDKit or chemfp etc. someday.
>>>>
>>>> Can you point me to where this might have been done?  I've searched and
>>>> searched but I can't seem to find a solution that will work for me.
>>>>
>>>> Thanks,
>>>> Jason Ochoada
>>>> St. Jude Children's Research Hospital
>>>>
>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Generating Fingerprints from Smiles or Mol

Reply via email to