I solved this by using a Window partitioned by 'id'. I used lead and lag to create columns, which contained nulls in the places that I needed to delete, in each fold. I then removed those rows with the nulls and my additional columns.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Large-where-clause-StackOverflow-1-5-2-tp27544p27559.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org