This line of code works for me on a data frame with over 6M compounds …

PandasTools.AddMoleculeColumnToFrame(df, 'smiles', 'mol', 
includeFingerprints=True)

‘smiles’ is the name of the column containing the SMILES, ‘mol’ is the name of 
the new column with the mol objects.

Once that’s done, you can address the rows where mol is ‘None’ …

From: Mike Mazanetz <mi...@novadatasolutions.co.uk>
Sent: Thursday, October 31, 2019 8:54 AM
To: 'Fiorella Ruggiu' <ruggiu.fiore...@gmail.com>
Cc: 'RDKit Discuss' <rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] failed mols in converting SMILES to Pandas 
dataframe Molecule

Hi Fio,

Thanks for the tips.  I’ve found that I need PandasTools to convert a smiles to 
a mol though, I’ve not had MolFromSmiles work on a dataframe.
Have you found that this works?

Cheers,
mike

From: Fiorella Ruggiu 
<ruggiu.fiore...@gmail.com<mailto:ruggiu.fiore...@gmail.com>>
Sent: 31 October 2019 15:48
To: Mike Mazanetz 
<mi...@novadatasolutions.co.uk<mailto:mi...@novadatasolutions.co.uk>>
Cc: Jan Halborg Jensen <jhjen...@chem.ku.dk<mailto:jhjen...@chem.ku.dk>>; RDKit 
Discuss 
<rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] failed mols in converting SMILES to Pandas 
dataframe Molecule

Hello Mike,

you could create a function with your if else structure and use apply on the 
pandas dataframe. For example, if you have a SMILES column in your df:

def addMol(smiles):
     if Chem.MolFromSmiles(smiles) is None:
                Etc
                return None # or whatever you wish to return when it fails
     else:
                Etc
                return Chem.MolFromSmiles(smiles)

df['RDKitMol']=df.apply(lambda row: addMol(row['SMILES']), axis=1)

Might not be as efficient as the build-in PandasTools though.

Best,
Fio

On Thu, Oct 31, 2019 at 8:07 AM Mike Mazanetz 
<mi...@novadatasolutions.co.uk<mailto:mi...@novadatasolutions.co.uk>> wrote:
Dear RDKit’ers

I’ve been trying to skip failed molecules in 
PandasTools.AddMoleculeColumnToFrame.

This is possible if I chuck each row to a different processor, but what I 
really want to do is return a missing row entry.

Normally I’d go:
If mol is None:
                Etc
Else:
                Etc

But Pandas DF’s seem to being playing hard-ball.

Any thoughts?

Cheers,
mike
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to