Re: iPython Notebook + Spark + Accumulo -- best practice?

Irfan Ahmad Wed, 18 Mar 2015 20:05:43 -0700

I forgot to mention that there is also Zeppelin and jove-notebook but I
haven't got any experience with those yet.



*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad <ir...@cloudphysics.com> wrote:

> Hi David,
>
> W00t indeed and great questions. On the notebook front, there are two
> options depending on what you are looking for. You can either go with
> iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
> Both have interesting tradeoffs.
>
> If you have looking for a single notebook platform for your data
> scientists that has R, Python as well as a Spark Shell, you'll likely want
> to go with iPython + Spark-kernel. Downsides with the spark-kernel project
> are that data visualization isn't quite there yet, early days for
> documentation and blogs/etc. Upside is that R and Python work beautifully
> and that the ipython committers are super-helpful.
>
> If you are OK with a primarily spark/scala experience, then I suggest you
> with spark-notebook. Upsides are that the project is a little further
> along, visualization support is better than spark-kernel (though not as
> good as iPython with Python) and the committer is awesome with help.
> Downside is that you won't get R and Python.
>
> FWIW: I'm using both at the moment!
>
> Hope that helps.
>
>
> *Irfan Ahmad*
> CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
> Best of VMworld Finalist
> Best Cloud Management Award
> NetworkWorld 10 Startups to Watch
> EMA Most Notable Vendor
>
> On Wed, Mar 18, 2015 at 5:45 PM, davidh <dav...@annaisystems.com> wrote:
>
>> hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
>> scanning through this archive with only moderate success. in other words
>> --
>> my way of saying sorry if this is answered somewhere obvious and I missed
>> it
>> :-)
>>
>> i've been tasked with figuring out how to connect Notebook, Spark, and
>> Accumulo together. The end user will do her work via notebook. thus far,
>> I've successfully setup a Vagrant image containing Spark, Accumulo, and
>> Hadoop. I was able to use some of the Accumulo example code to create a
>> table populated with data, create a simple program in scala that, when
>> fired
>> off to Spark via spark-submit, connects to accumulo and prints the first
>> ten
>> rows of data in the table. so w00t on that - but now I'm left with more
>> questions:
>>
>> 1) I'm still stuck on what's considered 'best practice' in terms of
>> hooking
>> all this together. Let's say Sally, a  user, wants to do some analytic
>> work
>> on her data. She pecks the appropriate commands into notebook and fires
>> them
>> off. how does this get wired together on the back end? Do I, from
>> notebook,
>> use spark-submit to send a job to spark and let spark worry about hooking
>> into accumulo or is it preferable to create some kind of open stream
>> between
>> the two?
>>
>> 2) if I want to extend spark's api, do I need to first submit an endless
>> job
>> via spark-submit that does something like what this gentleman describes
>> <http://blog.madhukaraphatak.com/extending-spark-api>  ? is there an
>> alternative (other than refactoring spark's source) that doesn't involve
>> extending the api via a job submission?
>>
>> ultimately what I'm looking for help locating docs, blogs, etc that may
>> shed
>> some light on this.
>>
>> t/y in advance!
>>
>> d
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: iPython Notebook + Spark + Accumulo -- best practice?

Reply via email to