You can use selectExpr and stack to achieve the same effect in PySpark:
df = spark.read.csv("your_file.csv", header=True, inferSchema=True) date_columns = [col for col in df.columns if '/' in col] df = df.selectExpr(["`Employee ID`", "`Name`", "`Client`", "`Project`", "`Team`”] + [f"stack({len(date_columns)}, {', '.join([f'`{col}`, `{col}` as `Status`' for col in date_columns])}) as (`Date`, `Status`)”]) result = df.groupby("Date", "Status").count() > On 21 Jun 2023, at 11:45, John Paul Jayme <john.ja...@tdcx.com.INVALID> wrote: > > Hi, > > This is currently my column definition : > Employee ID Name Client Project Team 01/01/2022 02/01/2022 > 03/01/2022 04/01/2022 05/01/2022 > 12345 Dummy x Dummy a abc team a OFF WO WH WH > WH > > As you can see, the outer columns are just daily attendance dates. My goal is > to count the employees who were OFF / WO / WH on said dates. I need to > transpose them so it would look like this : > > <image.png> > > I am still new to pandas. Can you guide me on how to produce this? I am > reading about melt() and set_index() but I am not sure if they are the > correct functions to use.