Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs.
If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com> Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh <dav...@annaisystems.com> wrote: > hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and > scanning through this archive with only moderate success. in other words -- > my way of saying sorry if this is answered somewhere obvious and I missed > it > :-) > > i've been tasked with figuring out how to connect Notebook, Spark, and > Accumulo together. The end user will do her work via notebook. thus far, > I've successfully setup a Vagrant image containing Spark, Accumulo, and > Hadoop. I was able to use some of the Accumulo example code to create a > table populated with data, create a simple program in scala that, when > fired > off to Spark via spark-submit, connects to accumulo and prints the first > ten > rows of data in the table. so w00t on that - but now I'm left with more > questions: > > 1) I'm still stuck on what's considered 'best practice' in terms of hooking > all this together. Let's say Sally, a user, wants to do some analytic work > on her data. She pecks the appropriate commands into notebook and fires > them > off. how does this get wired together on the back end? Do I, from notebook, > use spark-submit to send a job to spark and let spark worry about hooking > into accumulo or is it preferable to create some kind of open stream > between > the two? > > 2) if I want to extend spark's api, do I need to first submit an endless > job > via spark-submit that does something like what this gentleman describes > <http://blog.madhukaraphatak.com/extending-spark-api> ? is there an > alternative (other than refactoring spark's source) that doesn't involve > extending the api via a job submission? > > ultimately what I'm looking for help locating docs, blogs, etc that may > shed > some light on this. > > t/y in advance! > > d > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >