Sayth Renshaw wrote: > Peter Otten wrote:
>> def explode_consultants(consultants): Should have called that split_consultants(); takes a string and >> consultants = (c.lstrip("#") for c in consultants.split(";")) splits by ";", removes leading "#" >> return (c for c in consultants if c.strip("0123456789")) filters out the digit-only values. >> def explode_column(df, column, split): >> for _index, row in df.iterrows(): iterates over the rows of the data frame >> for part in split(row[column]): iterates over the names extracted by explode_consultants() from the "Consultants" field >> yield [part if c == column else row[c] for c in df.columns] Makes one row per consultant, replacing the contents of the Consultants column with the current consultant (bound to the part variable, e. g. with part = "Doe, John" yield [ "Doe, John" if c == "Consultants" else row[c] for c in ["Session Date", "Consultants"] ] >> def explode(df, column, split): Make a new data frame from the rows generated by explode_column(), reusing the column names from the original data frame. >> return pd.DataFrame( >> explode_column(df, "Consultant", split), columns=df.columns Here's a bug -- "Consultant" is hardcoded, but the column argument should be instead. >> ) >> df2 = explode(df, "Consultant", explode_consultants) >> >> print(df) >> print(df2) >> $ python3 pandas_explode_column.py >> Session date Consultant >> 0 2019-06-21 11:15:00 WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE... >> 1 2019-06-22 10:00:00 Doe, John;#42;Robbins, Rita; >> >> [2 rows x 2 columns] >> Session date Consultant >> 0 2019-06-21 11:15:00 WNEWSKI, Joan >> 1 2019-06-21 11:15:00 BALIN, Jock >> 2 2019-06-21 11:15:00 DUNE, Colem >> 3 2019-06-22 10:00:00 Doe, John >> 4 2019-06-22 10:00:00 Robbins, Rita >> >> [5 rows x 2 columns] >> $ > > Mind a little blown :-). Going to have to play and break this several > times to fully get it. Here's another usage example for explode() which "explodes" the "ham" column using range(): >>> df3 = pd.DataFrame([[2, "foo"], [3, "bar"], [0, "baz"]], columns=["ham", "spam"]) >>> explode(df3, "ham", range) ham spam 0 0 foo 1 1 foo 2 0 bar 3 1 bar 4 2 bar [5 rows x 2 columns] -- https://mail.python.org/mailman/listinfo/python-list