data = {
"Employee ID": [12345, 12346, 12347, 12348, 12349],
"Name": ["Dummy x", "Dummy y", "Dummy z", "Dummy a", "Dummy b"],
"Client": ["Dummy a", "Dummy b", "Dummy c", "Dummy d", "Dummy e"],
"Project": ["abc", "def", "ghi", "jkl", "mno"],
"Team": ["team a", "team b", "team c",
You can use selectExpr and stack to achieve the same effect in PySpark:
df = spark.read.csv("your_file.csv", header=True, inferSchema=True)
date_columns = [col for col in df.columns if '/' in col]
df = df.selectExpr(["`Employee ID`", "`Name`", "`Client`", "`Project`",
"`Team`”]
+ [f"
Hi,
This is currently my column definition :
Employee ID NameClient Project Team01/01/2022 02/01/2022
03/01/2022 04/01/2022 05/01/2022
12345 Dummy x Dummy a abc team a OFF WO WH WH
WH
As you can see, the outer columns are just d