Sayth Renshaw wrote: > Hi > > Having fun with pandas filtering a work excel file. > My current script opens selected and filters the data and saves as excel. > > import pandas as pd > import numpy as np > > log = pd.read_excel("log_dump_py.xlsx") > df = log.filter(items=['Completed', 'Priority', 'Session date', > 'Consultant', 'Coach', > 'Delivery Method', 'Focus of Coaching', 'Leader', 'Site', > 'Coaching Description','Motor/Property', > ],) > completed_tasks = df.loc[(df['Completed'] == 'Yes') & > (df['Motor/Property'] == 'Motor') & (df['Delivery Method'] == 'Group > Coaching')] print(completed_tasks.head(n=5)) > completed_tasks.to_excel("filtered_logs.xlsx") > > This leaves me with a set of several columns. The main column of concern > for this example is a consultant > > Session date Consultant > 2019-06-21 11:15:00 WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE, > Colem;#17230; > > How can I split the consultant column, keep only names and drop the > numbers and for every session date create a line with data and consultants > name? > > NB. There are varied amounts of consultants so splitting across columns is > uneven. if it was even melt seems like it would be good > https://dfrieds.com/data-analysis/melt-unpivot-python-pandas > > > Thanks > > Sayth
Since I didn't find a cool shortcut I decided to use brute force: $ cat pandas_explode_column.py import pandas as pd df = pd.DataFrame( [ [ "2019-06-21 11:15:00", "WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE, Colem;#17230;" ], [ "2019-06-22 10:00:00", "Doe, John;#42;Robbins, Rita;" ] ], columns=["Session date", "Consultant"] ) def explode_consultants(consultants): consultants = (c.lstrip("#") for c in consultants.split(";")) return (c for c in consultants if c.strip("0123456789")) def explode_column(df, column, split): for _index, row in df.iterrows(): for part in split(row[column]): yield [part if c == column else row[c] for c in df.columns] def explode(df, column, split): return pd.DataFrame( explode_column(df, "Consultant", split), columns=df.columns ) df2 = explode(df, "Consultant", explode_consultants) print(df) print(df2) $ python3 pandas_explode_column.py Session date Consultant 0 2019-06-21 11:15:00 WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE... 1 2019-06-22 10:00:00 Doe, John;#42;Robbins, Rita; [2 rows x 2 columns] Session date Consultant 0 2019-06-21 11:15:00 WNEWSKI, Joan 1 2019-06-21 11:15:00 BALIN, Jock 2 2019-06-21 11:15:00 DUNE, Colem 3 2019-06-22 10:00:00 Doe, John 4 2019-06-22 10:00:00 Robbins, Rita [5 rows x 2 columns] $ -- https://mail.python.org/mailman/listinfo/python-list