subject:"iPython Notebook \+ Spark \+ Accumulo \-\- best practice\?"

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Nick Pentreath

 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

On Mar 25, 2015, at 5:27 PM, Irfan Ahmad ir...@cloudphysics.com
 wrote:

Hmmm this seems very accumulo-specific, doesn't it? Not sure how
 to help with that.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

   On Tue, Mar 24, 2015 at 4:09 PM, David Holiday dav...@annaisystems.com
  wrote:

  hi all,

  got a vagrant image with spark notebook, spark, accumulo, and hadoop
 all running. from notebook I can manually create a scanner and pull test
 data from a table I created using one of the accumulo examples:

 val instanceNameS = accumuloval zooServersS = localhost:2181val 
 instance: Instance = new ZooKeeperInstance(instanceNameS, zooServersS)val 
 connector: Connector = instance.getConnector( root, new 
 PasswordToken(password))val auths = new Authorizations(exampleVis)val 
 scanner = connector.createScanner(batchtest1, auths)

 scanner.setRange(new Range(row_00, row_10))
 for(entry: Entry[Key, Value] - scanner) {
   println(entry.getKey +  is  + entry.getValue)}

 will give the first ten rows of table data. when I try to create the RDD
 thusly:

 val rdd2 =
   sparkContext.newAPIHadoopRDD (
 new Configuration(),
 classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat],
 classOf[org.apache.accumulo.core.data.Key],
 classOf[org.apache.accumulo.core.data.Value]
   )

 I get an RDD returned to me that I can't do much with due to the
 following error:

 java.io.IOException: Input info has not been set. at
 org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630)
 at
 org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343)
 at
 org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at
 org.apache.spark.rdd.RDD.count(RDD.scala:927)

 which totally makes sense in light of the fact that I haven't specified
 any parameters as to which table to connect with, what the auths are, etc.

 so my question is: what do I need to do from here to get those first ten
 rows of table data into my RDD?



   DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.com
 wrote:

  kk - I'll put something together and get back to you with more :-)

 DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com
 wrote:

  Once you setup spark-notebook, it'll handle the submits for
 interactive work. Non-interactive is not handled by it. For that
 spark-kernel could be used.

  Give it a shot ... it only takes 5 minutes to get it running in
 local-mode.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.com
 wrote:

 hi all - thx for the alacritous replies! so regarding how to get things
 from notebook to spark and back, am I correct that spark-submit is the way
 to go?

 DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it
 wrote:

   Yes, I would suggest spark-notebook too.
 It's very simple to setup and it's growing pretty fast.

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Irfan Ahmad ir...@cloudphysics.com
 Inviato: ‎19/‎03/‎2015 04:05
 A: davidh dav...@annaisystems.com
 Cc: user@spark.apache.org
 Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

  I forgot to mention that there is also Zeppelin and jove-notebook but
 I haven't got any experience with those yet.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday

],
classOf[org.apache.accumulo.core.data.Key],
classOf[org.apache.accumulo.core.data.Value]
  )

I get an RDD returned to me that I can't do much with due to the following 
error:

java.io.IOException: Input info has not been set. at 
org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630)
 at 
org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343)
 at 
org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at 
org.apache.spark.rdd.RDD.count(RDD.scala:927)

which totally makes sense in light of the fact that I haven't specified any 
parameters as to which table to connect with, what the auths are, etc.

so my question is: what do I need to do from here to get those first ten rows 
of table data into my RDD?



DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 11:25 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:

kk - I'll put something together and get back to you with more :-)

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 10:59 AM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:

Once you setup spark-notebook, it'll handle the submits for interactive work. 
Non-interactive is not handled by it. For that spark-kernel could be used.

Give it a shot ... it only takes 5 minutes to get it running in local-mode.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Thu, Mar 19, 2015 at 9:51 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all - thx for the alacritous replies! so regarding how to get things from 
notebook to spark and back, am I correct that spark-submit is the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 1:14 AM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Russ Weeks


   On Tue, Mar 24, 2015 at 4:09 PM, David Holiday dav...@annaisystems.com
  wrote:

  hi all,

  got a vagrant image with spark notebook, spark, accumulo, and hadoop
 all running. from notebook I can manually create a scanner and pull test
 data from a table I created using one of the accumulo examples:

 val instanceNameS = accumuloval zooServersS = localhost:2181val 
 instance: Instance = new ZooKeeperInstance(instanceNameS, zooServersS)val 
 connector: Connector = instance.getConnector( root, new 
 PasswordToken(password))val auths = new Authorizations(exampleVis)val 
 scanner = connector.createScanner(batchtest1, auths)

 scanner.setRange(new Range(row_00, row_10))
 for(entry: Entry[Key, Value] - scanner) {
   println(entry.getKey +  is  + entry.getValue)}

 will give the first ten rows of table data. when I try to create the RDD
 thusly:

 val rdd2 =
   sparkContext.newAPIHadoopRDD (
 new Configuration(),
 classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat],
 classOf[org.apache.accumulo.core.data.Key],
 classOf[org.apache.accumulo.core.data.Value]
   )

 I get an RDD returned to me that I can't do much with due to the
 following error:

 java.io.IOException: Input info has not been set. at
 org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630)
 at
 org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343)
 at
 org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at
 org.apache.spark.rdd.RDD.count(RDD.scala:927)

 which totally makes sense in light of the fact that I haven't specified
 any parameters as to which table to connect with, what the auths are, etc.

 so my question is: what do I need to do from here to get those first ten
 rows of table data into my RDD?



   DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.com
 wrote:

  kk - I'll put something together and get back to you with more :-)

 DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com
 wrote:

  Once you setup spark-notebook, it'll handle the submits for
 interactive work. Non-interactive is not handled by it. For that
 spark-kernel could be used.

  Give it a shot ... it only takes 5 minutes to get it running in
 local-mode.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.com
 wrote:

 hi all - thx for the alacritous replies! so regarding how to get things
 from notebook to spark and back, am I correct that spark-submit is the way
 to go?

 DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it
 wrote:

   Yes, I would suggest spark-notebook too.
 It's very simple to setup and it's growing pretty fast.

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Irfan Ahmad ir...@cloudphysics.com
 Inviato: ‎19/‎03/‎2015 04:05
 A: davidh dav...@annaisystems.com
 Cc: user@spark.apache.org
 Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

  I forgot to mention that there is also Zeppelin and jove-notebook but
 I haven't got any experience with those yet.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com
 wrote:

 Hi David,

  W00t indeed and great questions. On the notebook front, there are
 two options depending on what you are looking for. You can either go with
 iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
 Both have interesting tradeoffs.

  If you have looking for a single notebook platform for your

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Corey Nolet

-submit is
the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300 | Office
312 758 8385 | Mobile
dav...@annaisystems.com broo...@annaisystems.com

GetFileAttachment.jpg
www.AnnaiSystems.com http://www.annaisystems.com/

On Mar 19, 2015, at 1:14 AM, Paolo Platter
paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone
--
Da: Irfan Ahmad ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidh dav...@annaisystems.com
Cc: user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook
but I haven't got any experience with those yet.

*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com
wrote:

Hi David,

W00t indeed and great questions. On the notebook front, there are
two options depending on what you are looking for. You can either go
with
iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
Both have interesting tradeoffs.

If you have looking for a single notebook platform for your data
scientists that has R, Python as well as a Spark Shell, you'll likely
want
to go with iPython + Spark-kernel. Downsides with the spark-kernel
project
are that data visualization isn't quite there yet, early days for
documentation and blogs/etc. Upside is that R and Python work
beautifully
and that the ipython committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I
suggest you with spark-notebook. Upsides are that the project is a
little
further along, visualization support is better than spark-kernel (though
not as good as iPython with Python) and the committer is awesome with
help.
Downside is that you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.

*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com
wrote:

hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing,
and
scanning through this archive with only moderate success. in other
words --
my way of saying sorry if this is answered somewhere obvious and I
missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark,
and
Accumulo together. The end user will do her work via notebook. thus
far,
I've successfully setup a Vagrant image containing Spark, Accumulo,
and
Hadoop. I was able to use some of the Accumulo example code to
create a
table populated with data, create a simple program in scala that,
when fired
off to Spark via spark-submit, connects to accumulo and prints the
first ten
rows of data in the table. so w00t on that - but now I'm left with
more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of
hooking
all this together. Let's say Sally, a user, wants to do some
analytic work
on her data. She pecks the appropriate commands into notebook and
fires them
off. how does this get wired together on the back end? Do I, from
notebook,
use spark-submit to send a job to spark and let spark worry about
hooking
into accumulo or is it preferable to create some kind of open
stream between
the two?

2) if I want to extend spark's api, do I need to first submit an
endless job
via spark-submit that does something like what this gentleman
describes
http://blog.madhukaraphatak.com/extending-spark-api ? is there
an
alternative (other than refactoring spark's source) that doesn't
involve
extending the api via a job submission?

ultimately what I'm looking for help locating docs, blogs, etc that
may shed
some light on this.

t/y in advance!

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com http://nabble.com/.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread andy petrella

 broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com
 wrote:

  Once you setup spark-notebook, it'll handle the submits for
 interactive work. Non-interactive is not handled by it. For that
 spark-kernel could be used.

  Give it a shot ... it only takes 5 minutes to get it running in
 local-mode.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Thu, Mar 19, 2015 at 9:51 AM, David Holiday 
 dav...@annaisystems.com wrote:

 hi all - thx for the alacritous replies! so regarding how to get
 things from notebook to spark and back, am I correct that spark-submit 
 is
 the way to go?

 DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 1:14 AM, Paolo Platter 
 paolo.plat...@agilelab.it wrote:

   Yes, I would suggest spark-notebook too.
 It's very simple to setup and it's growing pretty fast.

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Irfan Ahmad ir...@cloudphysics.com
 Inviato: ‎19/‎03/‎2015 04:05
 A: davidh dav...@annaisystems.com
 Cc: user@spark.apache.org
 Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

  I forgot to mention that there is also Zeppelin and jove-notebook
 but I haven't got any experience with those yet.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com
  wrote:

 Hi David,

  W00t indeed and great questions. On the notebook front, there are
 two options depending on what you are looking for. You can either go 
 with
 iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
 Both have interesting tradeoffs.

  If you have looking for a single notebook platform for your data
 scientists that has R, Python as well as a Spark Shell, you'll likely 
 want
 to go with iPython + Spark-kernel. Downsides with the spark-kernel 
 project
 are that data visualization isn't quite there yet, early days for
 documentation and blogs/etc. Upside is that R and Python work 
 beautifully
 and that the ipython committers are super-helpful.

  If you are OK with a primarily spark/scala experience, then I
 suggest you with spark-notebook. Upsides are that the project is a 
 little
 further along, visualization support is better than spark-kernel 
 (though
 not as good as iPython with Python) and the committer is awesome with 
 help.
 Downside is that you won't get R and Python.

  FWIW: I'm using both at the moment!

  Hope that helps.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com
 wrote:

 hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing,
 and
 scanning through this archive with only moderate success. in other
 words --
 my way of saying sorry if this is answered somewhere obvious and I
 missed it
 :-)

 i've been tasked with figuring out how to connect Notebook, Spark,
 and
 Accumulo together. The end user will do her work via notebook.
 thus far,
 I've successfully setup a Vagrant image containing Spark,
 Accumulo, and
 Hadoop. I was able to use some of the Accumulo example code to
 create a
 table populated with data, create a simple program in scala that,
 when fired
 off to Spark via spark-submit, connects to accumulo and prints the
 first ten
 rows of data in the table. so w00t on that - but now I'm left with
 more
 questions:

 1) I'm still stuck on what's considered 'best practice' in terms
 of hooking
 all this together. Let's say Sally, a  user, wants to do some
 analytic work
 on her data. She pecks the appropriate commands into notebook and
 fires them
 off. how does this get wired together on the back end? Do I, from
 notebook,
 use spark-submit to send a job to spark and let spark worry about
 hooking
 into accumulo or is it preferable to create some kind of open
 stream between
 the two?

 2) if I want to extend spark's api, do I need to first submit an
 endless job
 via spark-submit that does something like what this gentleman
 describes
 http://blog.madhukaraphatak.com/extending-spark-api  ? is there
 an
 alternative (other than refactoring spark's source) that doesn't
 involve
 extending the api via a job submission?

 ultimately what I'm looking for help locating docs, blogs, etc
 that may shed
 some light on this.

 t/y in advance!

 d

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday

760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 10:59 AM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:

Once you setup spark-notebook, it'll handle the submits for interactive work. 
Non-interactive is not handled by it. For that spark-kernel could be used.

Give it a shot ... it only takes 5 minutes to get it running in local-mode.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Thu, Mar 19, 2015 at 9:51 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all - thx for the alacritous replies! so regarding how to get things from 
notebook to spark and back, am I correct that spark-submit is the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 1:14 AM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two?

2) if I want to extend spark's api, do I need to first submit an endless job
via spark-submit that does something like what this gentleman describes
http://blog.madhukaraphatak.com/extending-spark-api  ? is there an
alternative (other than refactoring spark's source

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday

(RDD.scala:220) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at 
org.apache.spark.rdd.RDD.count(RDD.scala:927)

which totally makes sense in light of the fact that I haven't specified any 
parameters as to which table to connect with, what the auths are, etc.

so my question is: what do I need to do from here to get those first ten rows 
of table data into my RDD?



DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 11:25 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:

kk - I'll put something together and get back to you with more :-)

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 10:59 AM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:

Once you setup spark-notebook, it'll handle the submits for interactive work. 
Non-interactive is not handled by it. For that spark-kernel could be used.

Give it a shot ... it only takes 5 minutes to get it running in local-mode.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Thu, Mar 19, 2015 at 9:51 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all - thx for the alacritous replies! so regarding how to get things from 
notebook to spark and back, am I correct that spark-submit is the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 1:14 AM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Nick Pentreath

:14 AM, Paolo Platter paolo.plat...@agilelab.it
wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

I forgot to mention that there is also Zeppelin and jove-notebook but
I haven't got any experience with those yet.

*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com
wrote:

Hi David,

W00t indeed and great questions. On the notebook front, there are two
options depending on what you are looking for. You can either go with
iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
Both have interesting tradeoffs.

If you have looking for a single notebook platform for your data
scientists that has R, Python as well as a Spark Shell, you'll likely want
to go with iPython + Spark-kernel. Downsides with the spark-kernel project
are that data visualization isn't quite there yet, early days for
documentation and blogs/etc. Upside is that R and Python work beautifully
and that the ipython committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest
you with spark-notebook. Upsides are that the project is a little further
along, visualization support is better than spark-kernel (though not as
good as iPython with Python) and the committer is awesome with help.
Downside is that you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.

*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com
wrote:

hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other
words --
my way of saying sorry if this is answered somewhere obvious and I
missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus
far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when
fired
off to Spark via spark-submit, connects to accumulo and prints the
first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of
hooking
all this together. Let's say Sally, a user, wants to do some analytic
work
on her data. She pecks the appropriate commands into notebook and
fires them
off. how does this get wired together on the back end? Do I, from
notebook,
use spark-submit to send a job to spark and let spark worry about
hooking
into accumulo or is it preferable to create some kind of open stream
between
the two?

2) if I want to extend spark's api, do I need to first submit an
endless job
via spark-submit that does something like what this gentleman describes
http://blog.madhukaraphatak.com/extending-spark-api ? is there an
alternative (other than refactoring spark's source) that doesn't
involve
extending the api via a job submission?

ultimately what I'm looking for help locating docs, blogs, etc that
may shed
some light on this.

t/y in advance!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday

://www.annaisystems.com/

On Mar 19, 2015, at 11:25 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:

kk - I'll put something together and get back to you with more :-)

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 10:59 AM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:

Once you setup spark-notebook, it'll handle the submits for interactive work. 
Non-interactive is not handled by it. For that spark-kernel could be used.

Give it a shot ... it only takes 5 minutes to get it running in local-mode.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Thu, Mar 19, 2015 at 9:51 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all - thx for the alacritous replies! so regarding how to get things from 
notebook to spark and back, am I correct that spark-submit is the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 1:14 AM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two?

2) if I want to extend spark's api, do I need

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-25 Thread Irfan Ahmad

Hmmm this seems very accumulo-specific, doesn't it? Not sure how to
help with that.


*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Tue, Mar 24, 2015 at 4:09 PM, David Holiday dav...@annaisystems.com
wrote:

  hi all,

  got a vagrant image with spark notebook, spark, accumulo, and hadoop all
 running. from notebook I can manually create a scanner and pull test data
 from a table I created using one of the accumulo examples:

 val instanceNameS = accumuloval zooServersS = localhost:2181val instance: 
 Instance = new ZooKeeperInstance(instanceNameS, zooServersS)val connector: 
 Connector = instance.getConnector( root, new PasswordToken(password))val 
 auths = new Authorizations(exampleVis)val scanner = 
 connector.createScanner(batchtest1, auths)

 scanner.setRange(new Range(row_00, row_10))
 for(entry: Entry[Key, Value] - scanner) {
   println(entry.getKey +  is  + entry.getValue)}

 will give the first ten rows of table data. when I try to create the RDD
 thusly:

 val rdd2 =
   sparkContext.newAPIHadoopRDD (
 new Configuration(),
 classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat],
 classOf[org.apache.accumulo.core.data.Key],
 classOf[org.apache.accumulo.core.data.Value]
   )

 I get an RDD returned to me that I can't do much with due to the following
 error:

 java.io.IOException: Input info has not been set. at
 org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630)
 at
 org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343)
 at
 org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at
 org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at
 scala.Option.getOrElse(Option.scala:120) at
 org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at
 org.apache.spark.rdd.RDD.count(RDD.scala:927)

 which totally makes sense in light of the fact that I haven't specified
 any parameters as to which table to connect with, what the auths are, etc.

 so my question is: what do I need to do from here to get those first ten
 rows of table data into my RDD?



  DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com



 www.AnnaiSystems.com

  On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.com
 wrote:

  kk - I'll put something together and get back to you with more :-)

 DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com wrote:

  Once you setup spark-notebook, it'll handle the submits for interactive
 work. Non-interactive is not handled by it. For that spark-kernel could be
 used.

  Give it a shot ... it only takes 5 minutes to get it running in
 local-mode.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.com
 wrote:

 hi all - thx for the alacritous replies! so regarding how to get things
 from notebook to spark and back, am I correct that spark-submit is the way
 to go?

 DAVID HOLIDAY
  Software Engineer
  760 607 3300 | Office
  312 758 8385 | Mobile
  dav...@annaisystems.com broo...@annaisystems.com


 GetFileAttachment.jpg
 www.AnnaiSystems.com http://www.annaisystems.com/

  On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it
 wrote:

   Yes, I would suggest spark-notebook too.
 It's very simple to setup and it's growing pretty fast.

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Irfan Ahmad ir...@cloudphysics.com
 Inviato: ‎19/‎03/‎2015 04:05
 A: davidh dav...@annaisystems.com
 Cc: user@spark.apache.org
 Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

  I forgot to mention that there is also Zeppelin and jove-notebook but I
 haven't got any experience with those yet.


  *Irfan Ahmad*
 CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/
 Best of VMworld Finalist
  Best Cloud Management Award
  NetworkWorld 10 Startups to Watch
 EMA Most Notable Vendor

 On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com
 wrote:

 Hi David,

  W00t indeed and great questions. On the notebook front, there are two
 options

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-25 Thread David Holiday

-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two?

2) if I want to extend spark's api, do I need to first submit an endless job
via spark-submit that does something like what this gentleman describes
http://blog.madhukaraphatak.com/extending-spark-api  ? is there an
alternative (other than refactoring spark's source) that doesn't involve
extending the api via a job submission?

ultimately what I'm looking for help locating docs, blogs, etc that may shed
some light on this.

t/y in advance!

d



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
Sent from the Apache Spark User List mailing list archive at 
Nabble.comhttp://nabble.com/.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-24 Thread David Holiday

hi all,

got a vagrant image with spark notebook, spark, accumulo, and hadoop all 
running. from notebook I can manually create a scanner and pull test data from 
a table I created using one of the accumulo examples:

val instanceNameS = accumulo
val zooServersS = localhost:2181
val instance: Instance = new ZooKeeperInstance(instanceNameS, zooServersS)
val connector: Connector = instance.getConnector( root, new 
PasswordToken(password))
val auths = new Authorizations(exampleVis)
val scanner = connector.createScanner(batchtest1, auths)

scanner.setRange(new Range(row_00, row_10))

for(entry: Entry[Key, Value] - scanner) {
  println(entry.getKey +  is  + entry.getValue)
}

will give the first ten rows of table data. when I try to create the RDD thusly:

val rdd2 =
  sparkContext.newAPIHadoopRDD (
new Configuration(),
classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat],
classOf[org.apache.accumulo.core.data.Key],
classOf[org.apache.accumulo.core.data.Value]
  )

I get an RDD returned to me that I can't do much with due to the following 
error:

java.io.IOException: Input info has not been set. at 
org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630)
 at 
org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343)
 at 
org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at 
scala.Option.getOrElse(Option.scala:120) at 
org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at 
org.apache.spark.rdd.RDD.count(RDD.scala:927)

which totally makes sense in light of the fact that I haven't specified any 
parameters as to which table to connect with, what the auths are, etc.

so my question is: what do I need to do from here to get those first ten rows 
of table data into my RDD?



DAVID HOLIDAY
Software Engineer
760 607 3300 | Office
312 758 8385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


[cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com]
www.AnnaiSystems.comhttp://www.AnnaiSystems.com

On Mar 19, 2015, at 11:25 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:

kk - I'll put something together and get back to you with more :-)

DAVID HOLIDAY
Software Engineer
760 607 3300 | Office
312 758 8385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 10:59 AM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:

Once you setup spark-notebook, it'll handle the submits for interactive work. 
Non-interactive is not handled by it. For that spark-kernel could be used.

Give it a shot ... it only takes 5 minutes to get it running in local-mode.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Thu, Mar 19, 2015 at 9:51 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all - thx for the alacritous replies! so regarding how to get things from 
notebook to spark and back, am I correct that spark-submit is the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 1:14 AM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have

R: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-19 Thread Paolo Platter

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two?

2) if I want to extend spark's api, do I need to first submit an endless job
via spark-submit that does something like what this gentleman describes
http://blog.madhukaraphatak.com/extending-spark-api  ? is there an
alternative (other than refactoring spark's source) that doesn't involve
extending the api via a job submission?

ultimately what I'm looking for help locating docs, blogs, etc that may shed
some light on this.

t/y in advance!

d



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-19 Thread David Holiday

kk - I'll put something together and get back to you with more :-)

DAVID HOLIDAY
Software Engineer
760 607 3300 | Office
312 758 8385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


[cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com]
www.AnnaiSystems.comhttp://www.AnnaiSystems.com

On Mar 19, 2015, at 10:59 AM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:

Once you setup spark-notebook, it'll handle the submits for interactive work. 
Non-interactive is not handled by it. For that spark-kernel could be used.

Give it a shot ... it only takes 5 minutes to get it running in local-mode.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Thu, Mar 19, 2015 at 9:51 AM, David Holiday 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all - thx for the alacritous replies! so regarding how to get things from 
notebook to spark and back, am I correct that spark-submit is the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300tel:760%20607%203300 | Office
312 758 8385tel:312%20758%208385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


GetFileAttachment.jpg
www.AnnaiSystems.comhttp://www.annaisystems.com/

On Mar 19, 2015, at 1:14 AM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two?

2) if I want to extend spark's api, do I need to first submit an endless job
via spark-submit that does something like what this gentleman describes
http

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-19 Thread David Holiday

hi all - thx for the alacritous replies! so regarding how to get things from 
notebook to spark and back, am I correct that spark-submit is the way to go?

DAVID HOLIDAY
Software Engineer
760 607 3300 | Office
312 758 8385 | Mobile
dav...@annaisystems.commailto:broo...@annaisystems.com


[cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com]
www.AnnaiSystems.comhttp://www.AnnaiSystems.com

On Mar 19, 2015, at 1:14 AM, Paolo Platter 
paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote:

Yes, I would suggest spark-notebook too.
It's very simple to setup and it's growing pretty fast.

Paolo

Inviata dal mio Windows Phone

Da: Irfan Ahmadmailto:ir...@cloudphysics.com
Inviato: ‎19/‎03/‎2015 04:05
A: davidhmailto:dav...@annaisystems.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice?

I forgot to mention that there is also Zeppelin and jove-notebook but I haven't 
got any experience with those yet.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad 
ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote:
Hi David,

W00t indeed and great questions. On the notebook front, there are two options 
depending on what you are looking for. You can either go with iPython 3 with 
Spark-kernel as a backend or you can use spark-notebook. Both have interesting 
tradeoffs.

If you have looking for a single notebook platform for your data scientists 
that has R, Python as well as a Spark Shell, you'll likely want to go with 
iPython + Spark-kernel. Downsides with the spark-kernel project are that data 
visualization isn't quite there yet, early days for documentation and 
blogs/etc. Upside is that R and Python work beautifully and that the ipython 
committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you with 
spark-notebook. Upsides are that the project is a little further along, 
visualization support is better than spark-kernel (though not as good as 
iPython with Python) and the committer is awesome with help. Downside is that 
you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.


Irfan Ahmad
CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh 
dav...@annaisystems.commailto:dav...@annaisystems.com wrote:
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two?

2) if I want to extend spark's api, do I need to first submit an endless job
via spark-submit that does something like what this gentleman describes
http://blog.madhukaraphatak.com/extending-spark-api  ? is there an
alternative (other than refactoring spark's source) that doesn't involve
extending the api via a job submission?

ultimately what I'm looking for help locating docs, blogs, etc that may shed
some light on this.

t/y in advance!

d



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
Sent from the Apache Spark User List mailing list archive at 
Nabble.comhttp://Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org

iPython Notebook + Spark + Accumulo -- best practice?

2015-03-18 Thread davidh

hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it
:-) 

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more
questions: 

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two? 

2) if I want to extend spark's api, do I need to first submit an endless job
via spark-submit that does something like what this gentleman describes
http://blog.madhukaraphatak.com/extending-spark-api  ? is there an
alternative (other than refactoring spark's source) that doesn't involve
extending the api via a job submission? 

ultimately what I'm looking for help locating docs, blogs, etc that may shed
some light on this. 

t/y in advance! 

d



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-18 Thread Irfan Ahmad

Hi David,

W00t indeed and great questions. On the notebook front, there are two
options depending on what you are looking for. You can either go with
iPython 3 with Spark-kernel as a backend or you can use spark-notebook.
Both have interesting tradeoffs.

If you have looking for a single notebook platform for your data scientists
that has R, Python as well as a Spark Shell, you'll likely want to go with
iPython + Spark-kernel. Downsides with the spark-kernel project are that
data visualization isn't quite there yet, early days for documentation and
blogs/etc. Upside is that R and Python work beautifully and that the
ipython committers are super-helpful.

If you are OK with a primarily spark/scala experience, then I suggest you
with spark-notebook. Upsides are that the project is a little further
along, visualization support is better than spark-kernel (though not as
good as iPython with Python) and the committer is awesome with help.
Downside is that you won't get R and Python.

FWIW: I'm using both at the moment!

Hope that helps.

*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com wrote:

hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed
it
:-)

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when
fired
off to Spark via spark-submit, connects to accumulo and prints the first
ten
rows of data in the table. so w00t on that - but now I'm left with more
questions:

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires
them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream
between
the two?

2) if I want to extend spark's api, do I need to first submit an endless
job
via spark-submit that does something like what this gentleman describes
http://blog.madhukaraphatak.com/extending-spark-api ? is there an
alternative (other than refactoring spark's source) that doesn't involve
extending the api via a job submission?

ultimately what I'm looking for help locating docs, blogs, etc that may
shed
some light on this.

t/y in advance!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

R: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

iPython Notebook + Spark + Accumulo -- best practice?

Re: iPython Notebook + Spark + Accumulo -- best practice?

17 matches

Site Navigation

Mail list logo

Footer information