I have a complex transformation requirements that i m implementing using dataframe. It involves lot of joins also with Cassandra table. I was wondering how can I debug the jobs n stages queued by spark sql the way I can do for Rdds.
In one of cases, spark sql creates more than 17 lakhs tasks for 2gb data.. I have set sql partition@32. Raghav