Re: [discuss] using deep learning to improve Spark

Michael Malak Fri, 01 Apr 2016 00:33:35 -0700

I see you've been burning the midnight oil.

      From: Reynold Xin <r...@databricks.com>
 To: "dev@spark.apache.org" <dev@spark.apache.org> 
 Sent: Friday, April 1, 2016 1:15 AM
 Subject: [discuss] using deep learning to improve Spark

Hi all,
Hope you all enjoyed the Tesla 3 unveiling earlier tonight.
I'd like to bring your attention to a project called DeepSpark that we have 
been working on for the past three years. We realized that scaling software 
development was challenging. A large fraction of software engineering has been 
manual and mundane: writing test cases, fixing bugs, implementing features 
according to specs, and reviewing pull requests. So we started this project to 
see how much we could automate.
After three years of development and one year of testing, we now have enough 
confidence that this could work well in practice. For example, Matei confessed 
to me today: "It looks like DeepSpark has a better understanding of Spark 
internals than I ever will. It updated several pieces of code I wrote long ago 
that even I no longer understood.”


I think it's time to discuss as a community about how we want to continue this 
project to ensure Spark is stable, secure, and easy to use yet able to progress 
as fast as possible. I'm still working on a more formal design doc, and it 
might take a little bit more time since I haven't been able to fully grasp 
DeepSpark's capabilities yet. Based on my understanding right now, I've written 
a blog post about DeepSpark here: 
https://databricks.com/blog/2016/04/01/unreasonable-effectiveness-of-deep-learning-on-spark.html

Please take a look and share your thoughts. Obviously, this is an ambitious 
project and could take many years to fully implement. One major challenge is 
cost. The current Spark Jenkins infrastructure provided by the AMPLab has only 
8 machines, but DeepSpark uses 12000 machines. I'm not sure whether AMPLab or 
Databricks can fund DeepSpark's operation for a long period of time. Perhaps 
AWS can help out here. Let me know if you have other ideas.

Re: [discuss] using deep learning to improve Spark

Reply via email to