Re: [Rdkit-discuss] appending new rows in dataframe with stereo-isomers

Ines Smit Sun, 30 Aug 2020 06:27:38 -0700

Hi Marawan,

I'm not sure this is the cause of the problem but regarding your line


 input_smiles_df.append(new_row,ignore_index=True)

in contrast to appending items to lists in Python, when you use the df.append() 
function it will return a new dataframe instead of adding a row in place.
So perhaps you need to reassign the dataframe like:

input_smiles_df = input_smiles_df.append(new_row,ignore_index=True)

although actually it is recommended to append to a list and then use 
df.concatenate() instead, according to the notes here: 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

I also noticed that according to the documentation, EnumerateStereoisomers only 
returns multiple isomers if the stereocenters are undefined 
(http://rdkit.org/docs/source/rdkit.Chem.EnumerateStereoisomers.html)

I don't think I can see the SMILES you are actually working with but see the 
difference in this example:

from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers

thalodamide = Chem.MolFromSmiles('O=C1CCC(N2C(=O)c3ccccc3C2=O)C(=O)N1')
isomers = tuple(EnumerateStereoisomers(thalodamide))
print(len(isomers))

thalodamide2 = Chem.MolFromSmiles('O=C1CC[C@H](N2C(=O)c3ccccc3C2=O)C(=O)N1')
isomers = tuple(EnumerateStereoisomers(thalodamide2))
print(len(isomers))

Output:

2
1


The option onlyUnassigned=False changes this behaviour (see documentation). 
Could this be why you are only getting back 1 every time from your 
print("Number of stereoisomer is: ", len(isomers_list)) ?

Not sure this solves your problem but perhaps worth checking.

Regards,
Ines
________________________________
From: Marawan Hussien via Rdkit-discuss <[email protected]>
Sent: 30 August 2020 05:05
To: [email protected] <[email protected]>
Subject: [Rdkit-discuss] appending new rows in dataframe with stereo-isomers

Hi,
I am trying to append (update) a pandas dataframe (created by Pnadatools from a 
CSV file) with potential stereoisomers for each molecule in the dataframe.
My understanding is that the EnumerateStereoisomers function returns a 
generator that I can loop through and use the mol object (or smiles created 
using the Chem.MolToSmiles function) to create new rows and then append this 
row to the end of the data frame, I tried the following code but nothing is 
appended:

from rdkit.Chem.EnumerateStereoisomers import EnumerateStereoisomers, 
StereoEnumerationOptions

def generate_Stereoisomers(x):
    opts = StereoEnumerationOptions(tryEmbedding=True)
    isomers = tuple(EnumerateStereoisomers(x, options=opts))
    return isomers

input_smiles_df["stereo_isomers"] = 
input_smiles_df["Cannonical_tautomer"].apply(lambda m:generate_Stereoisomers(m))

for index, row in input_smiles_df.iterrows():

        isomers_list = row["stereo_isomers"]

        print("Number of stereoisomer is: ", len(isomers_list))  ##This line 
always gives 1 back, although the molecules have many stereocenters

         for smi in sorted(rdkit.Chem.MolToSmiles(x,isomericSmiles=True) for x 
in isomers_list):

             print(smi)

             new_row = {'Cannonical_tautomer':None, 
id_col_name:str(row[id_col_name]),\
                        smiles_col_name:row[smiles_col_name], 
'standardized_smiles':smi,\
                            'num_stereo_isomers':row["num_stereo_isomers"]}

             input_smiles_df.append(new_row,ignore_index=True)


Any suggestion ?

Thanks

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] appending new rows in dataframe with stereo-isomers

Reply via email to