hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-)
i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes <http://blog.madhukaraphatak.com/extending-spark-api> ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org