Hi everyone,

During my most recent internship, I worked extensively with Apache Spark,
integrating it into a company's data analytics platform. I've now become
interested in contributing to Apache Spark.

I'm returning to undergraduate studies in January and there is an academic
course which is simply a standalone software engineering project. I was
thinking that some contribution to Apache Spark would satisfy my curiosity,
help continue support the company I interned at, and give me academic
credits required to graduate, all at the same time. It seems like too good
an opportunity to pass up.

With that in mind, I have the following questions:

   1. At this point, is there any self-contained project that I could work
   on within Spark? Ideally, I would work on it independently, in about a
   three month time frame. This time also needs to accommodate ramping up on
   the Spark codebase and adjusting to the Scala programming language and
   paradigms. The company I worked at primarily used the Java APIs. The output
   needs to be a technical report describing the project requirements, and the
   design process I took to engineer the solution for the requirements. In
   particular, it cannot just be a series of haphazard patches.
   2. How can I get started with contributing to Spark?
   3. Is there a high-level UML or some other design specification for the
   Spark architecture?

Thanks! I hope to be of some help =)

-Matt Cheah

Reply via email to