April 1st... : ) 2016-04-01 0:33 GMT-07:00 Michael Malak <michaelma...@yahoo.com.invalid>:
> I see you've been burning the midnight oil. > > > ------------------------------ > *From:* Reynold Xin <r...@databricks.com> > *To:* "dev@spark.apache.org" <dev@spark.apache.org> > *Sent:* Friday, April 1, 2016 1:15 AM > *Subject:* [discuss] using deep learning to improve Spark > > Hi all, > > Hope you all enjoyed the Tesla 3 unveiling earlier tonight. > > I'd like to bring your attention to a project called DeepSpark that we > have been working on for the past three years. We realized that scaling > software development was challenging. A large fraction of software > engineering has been manual and mundane: writing test cases, fixing bugs, > implementing features according to specs, and reviewing pull requests. So > we started this project to see how much we could automate. > > After three years of development and one year of testing, we now have > enough confidence that this could work well in practice. For example, Matei > confessed to me today: "It looks like DeepSpark has a better understanding > of Spark internals than I ever will. It updated several pieces of code I > wrote long ago that even I no longer understood.” > > > I think it's time to discuss as a community about how we want to continue > this project to ensure Spark is stable, secure, and easy to use yet able to > progress as fast as possible. I'm still working on a more formal design > doc, and it might take a little bit more time since I haven't been able to > fully grasp DeepSpark's capabilities yet. Based on my understanding right > now, I've written a blog post about DeepSpark here: > https://databricks.com/blog/2016/04/01/unreasonable-effectiveness-of-deep-learning-on-spark.html > > > Please take a look and share your thoughts. Obviously, this is an > ambitious project and could take many years to fully implement. One major > challenge is cost. The current Spark Jenkins infrastructure provided by the > AMPLab has only 8 machines, but DeepSpark uses 12000 machines. I'm not sure > whether AMPLab or Databricks can fund DeepSpark's operation for a long > period of time. Perhaps AWS can help out here. Let me know if you have > other ideas. > > > > > > > > >