Hi everyone, During my most recent internship, I worked extensively with Apache Spark, integrating it into a company's data analytics platform. I've now become interested in contributing to Apache Spark.
I'm returning to undergraduate studies in January and there is an academic course which is simply a standalone software engineering project. I was thinking that some contribution to Apache Spark would satisfy my curiosity, help continue support the company I interned at, and give me academic credits required to graduate, all at the same time. It seems like too good an opportunity to pass up. With that in mind, I have the following questions: 1. At this point, is there any self-contained project that I could work on within Spark? Ideally, I would work on it independently, in about a three month time frame. This time also needs to accommodate ramping up on the Spark codebase and adjusting to the Scala programming language and paradigms. The company I worked at primarily used the Java APIs. The output needs to be a technical report describing the project requirements, and the design process I took to engineer the solution for the requirements. In particular, it cannot just be a series of haphazard patches. 2. How can I get started with contributing to Spark? 3. Is there a high-level UML or some other design specification for the Spark architecture? Thanks! I hope to be of some help =) -Matt Cheah