Hi all, I'm working on an ETL task with Spark. As part of this work, I'd like to mark records with some info such as:
1. Whether the record is good or bad (e.g, Either) 2. Originating file and lines Part of my motivation is to prevent errors with individual records from stopping the entire pipeline. I'd also like to filter out and log bad records at various stages. I could use RDD[Either[T]] for everything but that won't work for DataFrames. I was wondering if anyone has had a similar situation and if they found elegant ways to handle this? Thanks, RJ