Hi all, I’m interested in hearing the community’s thoughts on best practices to do integration testing for spark sql jobs. We run a lot of our jobs with cloud infrastructure and hdfs - this makes debugging a challenge for us, especially with problems that don’t occur from just initializing a sparksession locally or testing with spark-shell. Ideally, we’d like some sort of docker container emulating hdfs and spark cluster mode, that you can run locally.
Any test framework, tips, or examples people can share? Thanks! -- Cheers, Ruijing Li