Links that was helpful to me during learning about the spark source code:
- Articles with "spark" tag in this blog:
http://hydronitrogen.com/tag/spark.html
- Jacek's "mastering apache spark" git book:
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
Hope those can help.
On Sat, Apr 8,
I'd like to eventually contribute to spark, but I'm noticing since spark 2
the query planner is heavily used throughout Dataset code base. Are there
any sites I can go to that explain the technical details, more than just
from a high-level prospective