I am using pyspark sql to run a sql script windows function to pull in
(lead) data from the next row to populate the first row. It works reliably
on Jupyter in VS code using anaconda pyspark 3.0.0. It produces different
data results every time on aws emr using spark 2.4.4. Why? Is there any
known bugs with subqueries or windows functions on pyspark 2.4?

Reply via email to