data = { "Employee ID": [12345, 12346, 12347, 12348, 12349], "Name": ["Dummy x", "Dummy y", "Dummy z", "Dummy a", "Dummy b"], "Client": ["Dummy a", "Dummy b", "Dummy c", "Dummy d", "Dummy e"], "Project": ["abc", "def", "ghi", "jkl", "mno"], "Team": ["team a", "team b", "team c", "team d", "team e"], "01/01/2022": ["OFF", "WO", "WH", "WH", "OFF"], "02/01/2022": ["WO", "WO", "OFF", "WH", "WH"], "03/01/2022": ["WH", "WH", "WH", "OFF", "WO"], "04/01/2022": ["WH", "WO", "WO", "WH", "OFF"], "05/01/2022": ["WH", "WH", "OFF", "WO", "WO"], }
df = ps.DataFrame(data) # Define dates columns dates_columns = df.columns[5:] # Melt the dataframe and count the occurrences df_melt = df.melt(id_vars=df.columns[:5], value_vars=dates_columns, var_name="Date", value_name="Status") df_counts = df_melt.groupby(["Date", "Status"]).size().unstack() df_counts.sort_index(inplace=True) df_counts [image: image.png] ons. 21. juni 2023 kl. 14:39 skrev Farshid Ashouri < farsheed.asho...@gmail.com>: > You can use selectExpr and stack to achieve the same effect in PySpark: > > > > df = spark.read.csv("your_file.csv", header=True, inferSchema=True) > > date_columns = [col for col in df.columns if '/' in col] > > df = df.selectExpr(["`Employee ID`", "`Name`", "`Client`", "`Project`", > "`Team`”] > + [f"stack({len(date_columns)}, {', '.join([f'`{col}`, `{col}` as > `Status`' for col in date_columns])}) as (`Date`, `Status`)”]) > > result = df.groupby("Date", "Status").count() > > > > > On 21 Jun 2023, at 11:45, John Paul Jayme <john.ja...@tdcx.com.INVALID> > wrote: > > Hi, > > This is currently my column definition : > Employee ID Name Client Project Team 01/01/2022 02/01/2022 03/01/2022 > 04/01/2022 05/01/2022 > 12345 Dummy x Dummy a abc team a OFF WO WH WH WH > As you can see, the outer columns are just daily attendance dates. My goal > is to count the employees who were OFF / WO / WH on said dates. I need to > transpose them so it would look like this : > > <image.png> > > I am still new to pandas. Can you guide me on how to produce this? I am > reading about melt() and set_index() but I am not sure if they are the > correct functions to use. > > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297