Awesome! Thanks for all the help! Jason
On Sat, Jan 5, 2019, 1:21 AM Greg Landrum <greg.land...@gmail.com> wrote: > > > On Fri, Jan 4, 2019 at 1:59 PM Jason Ochoada <jocho...@gmail.com> wrote: > >> >> Thanks so much for taking the time to help! I didn't realize the size >> limit recommendation for pandas so maybe that's why I don't see much of it. >> > > Yeah, Pandas is designed to keep the entire dataframe in memory. This > makes things tricky with large datasets. > For similarity searches across large sets, chemfp is really a great way to > go (and it's even better if you license the commercial version). > > >> I often work on much larger scale and was investigating moving from >> KNIME to RDKit on Linux for that reason. The curve is just steep right now >> :) learning python, pandas, RDKit etc. all at once! I'll start >> digging/searching for the more traditional straight python ways to do the >> same. >> > > Yeah, there's a lot there to pick up, but it sounds like you're making a > great start... good luck with it and please do keep asking questions as you > encounter problems! > > -greg > > > >> Thanks again for the info and help! >> Jason >> St. Jude Children's Research Hospital >> >> On Fri, Jan 4, 2019 at 1:34 AM Greg Landrum <greg.land...@gmail.com> >> wrote: >> >>> Hi Jason, >>> >>> This gist shows how to generate fingerprints for the molecules in a >>> pandas dataframe and then use them to do similarity searches: >>> https://gist.github.com/greglandrum/045ccf8009fde91fc985864e70ee72a1 >>> >>> This is a reasonably efficient way of working with a smallish (<10K) >>> number of molecules. >>> >>> -greg >>> >>> >>> On Thu, Jan 3, 2019 at 7:10 PM Jason Ochoada <jocho...@gmail.com> wrote: >>> >>>> Hi Everyone! >>>> >>>> I'm a newbie making the shift from RDKit in KNIME to working with the >>>> full package. I have been working (hacking) my through the tutorials I >>>> could find pandas, Jupyter, RDKit etc. I'm using RDKit in the anaconda 3 >>>> environment. I'm struggling to figure out how to do what I imagine is a >>>> very simple task. I have read in a flat file (Smiles file) and have it in >>>> a pandas data frame named cpds. It contained SMILES and ID. I have been >>>> able to add a molecule to the dataframe: >>>> >>>> >>>> PandasTools.AddMoleculeColumnToFrame(cpds,'SMILES','Molecule',includeFingerprints=False) >>>> print([str(x) for x in cpds.columns]) >>>> >>>> But I can't seem to figure out how to create and append a fingerprint. >>>> I'm open to any options as I'm new and don't have any particular structure >>>> I like to work in. Of course once I have this I'd like to do similarity >>>> searches either in RDKit or chemfp etc. someday. >>>> >>>> Can you point me to where this might have been done? I've searched and >>>> searched but I can't seem to find a solution that will work for me. >>>> >>>> Thanks, >>>> Jason Ochoada >>>> St. Jude Children's Research Hospital >>>> >>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdkit-discuss@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss