I needed the same for debugging and I just added "count" action in debug mode for every step I was interested in. It's very time-consuming, but I debug not very often.
2016-10-20 2:17 GMT-07:00 Andreas Hechenberger <inter...@hechenberger.me>: > Hey awesome Spark-Dev's :) > > i am new to spark and i read a lot but now i am stuck :( so please be > kind, if i ask silly questions. > > I want to analyze some algorithms and strategies in spark and for one > experiment i want to know the size of the intermediate results between > iterations/jobs. Some of them are written to disk and some are in the > cache, i guess. I am not afraid of looking into the code (i already did) > but its complex and have no clue where to start :( It would be nice if > someone can point me in the right direction or where i can find more > information about the structure of spark core devel :) > > I already setup the devel environment and i can compile spark. It was > really awesome how smoothly the setup was :) Thx for that. > > Servus > Andy > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- *Sincerely yoursEgor Pakhomov*