Hello All, I am Kai Jiang, a master student majoring in Computer Science. Machine Learning and Distributed System are my interests. Due to that, I've been contributing to Spark codebase since last year. My Pull Requests are related to MLlib, PySpark and SQL.( https://github.com/apache/spark/pulls/vectorijk)
This year, I really want to extend my future contribution with Spark into a GSoC project. Although the list of GSoC organizations this year hasn't been announced yet, it is highly possible that Apache Software Foundation would be accepted based on organization list before. Thus, I was wondering if there are some specific ideas, issues or suggestions regarding MLlib, SQL or others could be gathered into a project. Meanwhile, I also noticed that Spark 2.0 would be a big version in the near future. After looking into the MLlib 2.0 Roadmap <https://issues.apache.org/jira/browse/SPARK-12626>, I found there are many issues I am interested in (i.e Python/SparkR API for ML, PMML export, etc.). If community has other ideas, I am very willing to work on some issues before GSoC and get started with something new during GSoC. Looking forward to hearing from you! Best, Kai. github.com/vectorijk