Re: iPython Notebook + Spark + Accumulo -- best practice?

Irfan Ahmad Thu, 19 Mar 2015 11:02:06 -0700

Once you setup spark-notebook, it'll handle the submits for interactive
work. Non-interactive is not handled by it. For that spark-kernel could be
used.


Give it a shot ... it only takes 5 minutes to get it running in local-mode.


*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Thu, Mar 19, 2015 at 9:51 AM, David Holiday <dav...@annaisystems.com>
wrote:

>  hi all - thx for the alacritous replies! so regarding how to get things
> from notebook to spark and back, am I correct that spark-submit is the way
> to go?
>
> DAVID HOLIDAY
>  Software Engineer
>  760 607 3300 | Office
>  312 758 8385 | Mobile
>  dav...@annaisystems.com <broo...@annaisystems.com>
>
>
>
> www.AnnaiSystems.com
>
>  On Mar 19, 2015, at 1:14 AM, Paolo Platter <paolo.plat...@agilelab.it>
> wrote:
>
>   Yes, I would suggest spark-notebook too.
> It's very simple to setup and it's growing pretty fast.
>
> Paolo
>
> Inviata dal mio Windows Phone
>  ------------------------------
> Da: Irfan Ahmad <ir...@cloudphysics.com>
> Inviato: ‎19/‎03/‎2015 04:05
> A: davidh <dav...@annaisystems.com>
> Cc: user@spark.apache.org
> Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?
>
>  I forgot to mention that there is also Zeppelin and jove-notebook but I
> haven't got any experience with those yet.
>
>
>  *Irfan Ahmad*
> CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com/>
> Best of VMworld Finalist
>  Best Cloud Management Award
>  NetworkWorld 10 Startups to Watch
> EMA Most Notable Vendor
>
> On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad <ir...@cloudphysics.com>
> wrote:
>
>> Hi David,
>>
>>  W00t indeed and great questions. On the notebook front, there are two
>> options depending on what you are looking for. You can either go with
>> iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
>> Both have interesting tradeoffs.
>>
>>  If you have looking for a single notebook platform for your data
>> scientists that has R, Python as well as a Spark Shell, you'll likely want
>> to go with iPython + Spark-kernel. Downsides with the spark-kernel project
>> are that data visualization isn't quite there yet, early days for
>> documentation and blogs/etc. Upside is that R and Python work beautifully
>> and that the ipython committers are super-helpful.
>>
>>  If you are OK with a primarily spark/scala experience, then I suggest
>> you with spark-notebook. Upsides are that the project is a little further
>> along, visualization support is better than spark-kernel (though not as
>> good as iPython with Python) and the committer is awesome with help.
>> Downside is that you won't get R and Python.
>>
>>  FWIW: I'm using both at the moment!
>>
>>  Hope that helps.
>>
>>
>>  *Irfan Ahmad*
>> CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com/>
>> Best of VMworld Finalist
>>  Best Cloud Management Award
>>  NetworkWorld 10 Startups to Watch
>> EMA Most Notable Vendor
>>
>> On Wed, Mar 18, 2015 at 5:45 PM, davidh <dav...@annaisystems.com> wrote:
>>
>>> hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
>>> scanning through this archive with only moderate success. in other words
>>> --
>>> my way of saying sorry if this is answered somewhere obvious and I
>>> missed it
>>> :-)
>>>
>>> i've been tasked with figuring out how to connect Notebook, Spark, and
>>> Accumulo together. The end user will do her work via notebook. thus far,
>>> I've successfully setup a Vagrant image containing Spark, Accumulo, and
>>> Hadoop. I was able to use some of the Accumulo example code to create a
>>> table populated with data, create a simple program in scala that, when
>>> fired
>>> off to Spark via spark-submit, connects to accumulo and prints the first
>>> ten
>>> rows of data in the table. so w00t on that - but now I'm left with more
>>> questions:
>>>
>>> 1) I'm still stuck on what's considered 'best practice' in terms of
>>> hooking
>>> all this together. Let's say Sally, a  user, wants to do some analytic
>>> work
>>> on her data. She pecks the appropriate commands into notebook and fires
>>> them
>>> off. how does this get wired together on the back end? Do I, from
>>> notebook,
>>> use spark-submit to send a job to spark and let spark worry about hooking
>>> into accumulo or is it preferable to create some kind of open stream
>>> between
>>> the two?
>>>
>>> 2) if I want to extend spark's api, do I need to first submit an endless
>>> job
>>> via spark-submit that does something like what this gentleman describes
>>> <http://blog.madhukaraphatak.com/extending-spark-api>  ? is there an
>>> alternative (other than refactoring spark's source) that doesn't involve
>>> extending the api via a job submission?
>>>
>>> ultimately what I'm looking for help locating docs, blogs, etc that may
>>> shed
>>> some light on this.
>>>
>>> t/y in advance!
>>>
>>> d
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>

Re: iPython Notebook + Spark + Accumulo -- best practice?

Reply via email to