Re: iPython Notebook + Spark + Accumulo -- best practice?

Irfan Ahmad Wed, 18 Mar 2015 20:02:31 -0700

Hi David,

W00t indeed and great questions. On the notebook front, there are two
options depending on what you are looking for. You can either go with
iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
Both have interesting tradeoffs.


If you have looking for a single notebook platform for your data scientists
that has R, Python as well as a Spark Shell, you'll likely want to go with
iPython + Spark-kernel. Downsides with the spark-kernel project are that
data visualization isn't quite there yet, early days for documentation and
blogs/etc. Upside is that R and Python work beautifully and that the
ipython committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you
with spark-notebook. Upsides are that the project is a little further
along, visualization support is better than spark-kernel (though not as
good as iPython with Python) and the committer is awesome with help.
Downside is that you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh <dav...@annaisystems.com> wrote:

> hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
> scanning through this archive with only moderate success. in other words --
> my way of saying sorry if this is answered somewhere obvious and I missed
> it
> :-)
>
> i've been tasked with figuring out how to connect Notebook, Spark, and
> Accumulo together. The end user will do her work via notebook. thus far,
> I've successfully setup a Vagrant image containing Spark, Accumulo, and
> Hadoop. I was able to use some of the Accumulo example code to create a
> table populated with data, create a simple program in scala that, when
> fired
> off to Spark via spark-submit, connects to accumulo and prints the first
> ten
> rows of data in the table. so w00t on that - but now I'm left with more
> questions:
>
> 1) I'm still stuck on what's considered 'best practice' in terms of hooking
> all this together. Let's say Sally, a  user, wants to do some analytic work
> on her data. She pecks the appropriate commands into notebook and fires
> them
> off. how does this get wired together on the back end? Do I, from notebook,
> use spark-submit to send a job to spark and let spark worry about hooking
> into accumulo or is it preferable to create some kind of open stream
> between
> the two?
>
> 2) if I want to extend spark's api, do I need to first submit an endless
> job
> via spark-submit that does something like what this gentleman describes
> <http://blog.madhukaraphatak.com/extending-spark-api>  ? is there an
> alternative (other than refactoring spark's source) that doesn't involve
> extending the api via a job submission?
>
> ultimately what I'm looking for help locating docs, blogs, etc that may
> shed
> some light on this.
>
> t/y in advance!
>
> d
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: iPython Notebook + Spark + Accumulo -- best practice?

Reply via email to