Hi all,

I'm working on an ETL task with Spark.  As part of this work, I'd like to
mark records with some info such as:

1. Whether the record is good or bad (e.g, Either)
2. Originating file and lines

Part of my motivation is to prevent errors with individual records from
stopping the entire pipeline.  I'd also like to filter out and log bad
records at various stages.

I could use RDD[Either[T]] for everything but that won't work for
DataFrames.  I was wondering if anyone has had a similar situation and if
they found elegant ways to handle this?

Thanks,
RJ

Reply via email to