You can use selectExpr and stack to achieve the same effect in PySpark:


df = spark.read.csv("your_file.csv", header=True, inferSchema=True)

date_columns = [col for col in df.columns if '/' in col]

df = df.selectExpr(["`Employee ID`", "`Name`", "`Client`", "`Project`", 
"`Team`”] 
        + [f"stack({len(date_columns)}, {', '.join([f'`{col}`, `{col}` as 
`Status`' for col in date_columns])}) as (`Date`, `Status`)”])

result = df.groupby("Date", "Status").count()




> On 21 Jun 2023, at 11:45, John Paul Jayme <john.ja...@tdcx.com.INVALID> wrote:
> 
> Hi,
> 
> This is currently my column definition :
> Employee ID   Name    Client  Project Team    01/01/2022      02/01/2022      
> 03/01/2022      04/01/2022      05/01/2022
> 12345     Dummy x     Dummy a abc     team a  OFF     WO      WH      WH      
> WH
> 
> As you can see, the outer columns are just daily attendance dates. My goal is 
> to count the employees who were OFF / WO / WH on said dates. I need to 
> transpose them so it would look like this : 
> 
> <image.png>
> 
> I am still new to pandas. Can you guide me on how to produce this? I am 
> reading about melt() and set_index() but I am not sure if they are the 
> correct functions to use.

Reply via email to