Oh, the annual event... On Fri, Apr 1, 2016 at 4:37 PM, Xiao Li <gatorsm...@gmail.com> wrote:
> April 1st... : ) > > 2016-04-01 0:33 GMT-07:00 Michael Malak <michaelma...@yahoo.com.invalid>: > >> I see you've been burning the midnight oil. >> >> >> ------------------------------ >> *From:* Reynold Xin <r...@databricks.com> >> *To:* "dev@spark.apache.org" <dev@spark.apache.org> >> *Sent:* Friday, April 1, 2016 1:15 AM >> *Subject:* [discuss] using deep learning to improve Spark >> >> Hi all, >> >> Hope you all enjoyed the Tesla 3 unveiling earlier tonight. >> >> I'd like to bring your attention to a project called DeepSpark that we >> have been working on for the past three years. We realized that scaling >> software development was challenging. A large fraction of software >> engineering has been manual and mundane: writing test cases, fixing bugs, >> implementing features according to specs, and reviewing pull requests. So >> we started this project to see how much we could automate. >> >> After three years of development and one year of testing, we now have >> enough confidence that this could work well in practice. For example, Matei >> confessed to me today: "It looks like DeepSpark has a better understanding >> of Spark internals than I ever will. It updated several pieces of code I >> wrote long ago that even I no longer understood.” >> >> >> I think it's time to discuss as a community about how we want to continue >> this project to ensure Spark is stable, secure, and easy to use yet able to >> progress as fast as possible. I'm still working on a more formal design >> doc, and it might take a little bit more time since I haven't been able to >> fully grasp DeepSpark's capabilities yet. Based on my understanding right >> now, I've written a blog post about DeepSpark here: >> https://databricks.com/blog/2016/04/01/unreasonable-effectiveness-of-deep-learning-on-spark.html >> >> >> Please take a look and share your thoughts. Obviously, this is an >> ambitious project and could take many years to fully implement. One major >> challenge is cost. The current Spark Jenkins infrastructure provided by the >> AMPLab has only 8 machines, but DeepSpark uses 12000 machines. I'm not sure >> whether AMPLab or Databricks can fund DeepSpark's operation for a long >> period of time. Perhaps AWS can help out here. Let me know if you have >> other ideas. >> >> >> >> >> >> >> >> >> > -- --- Takeshi Yamamuro