Re: iPython Notebook + Spark + Accumulo -- best practice?
| Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 25, 2015, at 5:27 PM, Irfan Ahmad ir...@cloudphysics.com wrote: Hmmm this seems very accumulo-specific, doesn't it? Not sure how to help with that. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Tue, Mar 24, 2015 at 4:09 PM, David Holiday dav...@annaisystems.com wrote: hi all, got a vagrant image with spark notebook, spark, accumulo, and hadoop all running. from notebook I can manually create a scanner and pull test data from a table I created using one of the accumulo examples: val instanceNameS = accumuloval zooServersS = localhost:2181val instance: Instance = new ZooKeeperInstance(instanceNameS, zooServersS)val connector: Connector = instance.getConnector( root, new PasswordToken(password))val auths = new Authorizations(exampleVis)val scanner = connector.createScanner(batchtest1, auths) scanner.setRange(new Range(row_00, row_10)) for(entry: Entry[Key, Value] - scanner) { println(entry.getKey + is + entry.getValue)} will give the first ten rows of table data. when I try to create the RDD thusly: val rdd2 = sparkContext.newAPIHadoopRDD ( new Configuration(), classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat], classOf[org.apache.accumulo.core.data.Key], classOf[org.apache.accumulo.core.data.Value] ) I get an RDD returned to me that I can't do much with due to the following error: java.io.IOException: Input info has not been set. at org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at org.apache.spark.rdd.RDD.count(RDD.scala:927) which totally makes sense in light of the fact that I haven't specified any parameters as to which table to connect with, what the auths are, etc. so my question is: what do I need to do from here to get those first ten rows of table data into my RDD? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.com wrote: kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone -- Da: Irfan Ahmad ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidh dav...@annaisystems.com Cc: user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http
Re: iPython Notebook + Spark + Accumulo -- best practice?
], classOf[org.apache.accumulo.core.data.Key], classOf[org.apache.accumulo.core.data.Value] ) I get an RDD returned to me that I can't do much with due to the following error: java.io.IOException: Input info has not been set. at org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at org.apache.spark.rdd.RDD.count(RDD.scala:927) which totally makes sense in light of the fact that I haven't specified any parameters as to which table to connect with, what the auths are, etc. so my question is: what do I need to do from here to get those first ten rows of table data into my RDD? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp
Re: iPython Notebook + Spark + Accumulo -- best practice?
On Tue, Mar 24, 2015 at 4:09 PM, David Holiday dav...@annaisystems.com wrote: hi all, got a vagrant image with spark notebook, spark, accumulo, and hadoop all running. from notebook I can manually create a scanner and pull test data from a table I created using one of the accumulo examples: val instanceNameS = accumuloval zooServersS = localhost:2181val instance: Instance = new ZooKeeperInstance(instanceNameS, zooServersS)val connector: Connector = instance.getConnector( root, new PasswordToken(password))val auths = new Authorizations(exampleVis)val scanner = connector.createScanner(batchtest1, auths) scanner.setRange(new Range(row_00, row_10)) for(entry: Entry[Key, Value] - scanner) { println(entry.getKey + is + entry.getValue)} will give the first ten rows of table data. when I try to create the RDD thusly: val rdd2 = sparkContext.newAPIHadoopRDD ( new Configuration(), classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat], classOf[org.apache.accumulo.core.data.Key], classOf[org.apache.accumulo.core.data.Value] ) I get an RDD returned to me that I can't do much with due to the following error: java.io.IOException: Input info has not been set. at org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at org.apache.spark.rdd.RDD.count(RDD.scala:927) which totally makes sense in light of the fact that I haven't specified any parameters as to which table to connect with, what the auths are, etc. so my question is: what do I need to do from here to get those first ten rows of table data into my RDD? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.com wrote: kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone -- Da: Irfan Ahmad ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidh dav...@annaisystems.com Cc: user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your
Re: iPython Notebook + Spark + Accumulo -- best practice?
-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone -- Da: Irfan Ahmad ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidh dav...@annaisystems.com Cc: user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.com http://nabble.com/. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: iPython Notebook + Spark + Accumulo -- best practice?
broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone -- Da: Irfan Ahmad ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidh dav...@annaisystems.com Cc: user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d
Re: iPython Notebook + Spark + Accumulo -- best practice?
760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source
Re: iPython Notebook + Spark + Accumulo -- best practice?
(RDD.scala:220) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at org.apache.spark.rdd.RDD.count(RDD.scala:927) which totally makes sense in light of the fact that I haven't specified any parameters as to which table to connect with, what the auths are, etc. so my question is: what do I need to do from here to get those first ten rows of table data into my RDD? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo
Re: iPython Notebook + Spark + Accumulo -- best practice?
:14 AM, Paolo Platter paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone -- Da: Irfan Ahmad ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidh dav...@annaisystems.com Cc: user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.com http://nabble.com/. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: iPython Notebook + Spark + Accumulo -- best practice?
://www.annaisystems.com/ On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need
Re: iPython Notebook + Spark + Accumulo -- best practice?
Hmmm this seems very accumulo-specific, doesn't it? Not sure how to help with that. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Tue, Mar 24, 2015 at 4:09 PM, David Holiday dav...@annaisystems.com wrote: hi all, got a vagrant image with spark notebook, spark, accumulo, and hadoop all running. from notebook I can manually create a scanner and pull test data from a table I created using one of the accumulo examples: val instanceNameS = accumuloval zooServersS = localhost:2181val instance: Instance = new ZooKeeperInstance(instanceNameS, zooServersS)val connector: Connector = instance.getConnector( root, new PasswordToken(password))val auths = new Authorizations(exampleVis)val scanner = connector.createScanner(batchtest1, auths) scanner.setRange(new Range(row_00, row_10)) for(entry: Entry[Key, Value] - scanner) { println(entry.getKey + is + entry.getValue)} will give the first ten rows of table data. when I try to create the RDD thusly: val rdd2 = sparkContext.newAPIHadoopRDD ( new Configuration(), classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat], classOf[org.apache.accumulo.core.data.Key], classOf[org.apache.accumulo.core.data.Value] ) I get an RDD returned to me that I can't do much with due to the following error: java.io.IOException: Input info has not been set. at org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at org.apache.spark.rdd.RDD.count(RDD.scala:927) which totally makes sense in light of the fact that I haven't specified any parameters as to which table to connect with, what the auths are, etc. so my question is: what do I need to do from here to get those first ten rows of table data into my RDD? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com www.AnnaiSystems.com On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.com wrote: kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.com http://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone -- Da: Irfan Ahmad ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidh dav...@annaisystems.com Cc: user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options
Re: iPython Notebook + Spark + Accumulo -- best practice?
-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.comhttp://nabble.com/. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Re: iPython Notebook + Spark + Accumulo -- best practice?
hi all, got a vagrant image with spark notebook, spark, accumulo, and hadoop all running. from notebook I can manually create a scanner and pull test data from a table I created using one of the accumulo examples: val instanceNameS = accumulo val zooServersS = localhost:2181 val instance: Instance = new ZooKeeperInstance(instanceNameS, zooServersS) val connector: Connector = instance.getConnector( root, new PasswordToken(password)) val auths = new Authorizations(exampleVis) val scanner = connector.createScanner(batchtest1, auths) scanner.setRange(new Range(row_00, row_10)) for(entry: Entry[Key, Value] - scanner) { println(entry.getKey + is + entry.getValue) } will give the first ten rows of table data. when I try to create the RDD thusly: val rdd2 = sparkContext.newAPIHadoopRDD ( new Configuration(), classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat], classOf[org.apache.accumulo.core.data.Key], classOf[org.apache.accumulo.core.data.Value] ) I get an RDD returned to me that I can't do much with due to the following error: java.io.IOException: Input info has not been set. at org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:630) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:343) at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:538) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:98) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:220) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1367) at org.apache.spark.rdd.RDD.count(RDD.scala:927) which totally makes sense in light of the fact that I haven't specified any parameters as to which table to connect with, what the auths are, etc. so my question is: what do I need to do from here to get those first ten rows of table data into my RDD? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com [cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com] www.AnnaiSystems.comhttp://www.AnnaiSystems.com On Mar 19, 2015, at 11:25 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have
R: iPython Notebook + Spark + Accumulo -- best practice?
Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
Re: iPython Notebook + Spark + Accumulo -- best practice?
kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com [cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com] www.AnnaiSystems.comhttp://www.AnnaiSystems.com On Mar 19, 2015, at 10:59 AM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Once you setup spark-notebook, it'll handle the submits for interactive work. Non-interactive is not handled by it. For that spark-kernel could be used. Give it a shot ... it only takes 5 minutes to get it running in local-mode. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Thu, Mar 19, 2015 at 9:51 AM, David Holiday dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300tel:760%20607%203300 | Office 312 758 8385tel:312%20758%208385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com GetFileAttachment.jpg www.AnnaiSystems.comhttp://www.annaisystems.com/ On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http
Re: iPython Notebook + Spark + Accumulo -- best practice?
hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.commailto:broo...@annaisystems.com [cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com] www.AnnaiSystems.comhttp://www.AnnaiSystems.com On Mar 19, 2015, at 1:14 AM, Paolo Platter paolo.plat...@agilelab.itmailto:paolo.plat...@agilelab.it wrote: Yes, I would suggest spark-notebook too. It's very simple to setup and it's growing pretty fast. Paolo Inviata dal mio Windows Phone Da: Irfan Ahmadmailto:ir...@cloudphysics.com Inviato: 19/03/2015 04:05 A: davidhmailto:dav...@annaisystems.com Cc: user@spark.apache.orgmailto:user@spark.apache.org Oggetto: Re: iPython Notebook + Spark + Accumulo -- best practice? I forgot to mention that there is also Zeppelin and jove-notebook but I haven't got any experience with those yet. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 8:01 PM, Irfan Ahmad ir...@cloudphysics.commailto:ir...@cloudphysics.com wrote: Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. Irfan Ahmad CTO | Co-Founder | CloudPhysicshttp://www.cloudphysics.com/ Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.commailto:dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.comhttp://Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
iPython Notebook + Spark + Accumulo -- best practice?
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: iPython Notebook + Spark + Accumulo -- best practice?
Hi David, W00t indeed and great questions. On the notebook front, there are two options depending on what you are looking for. You can either go with iPython 3 with Spark-kernel as a backend or you can use spark-notebook. Both have interesting tradeoffs. If you have looking for a single notebook platform for your data scientists that has R, Python as well as a Spark Shell, you'll likely want to go with iPython + Spark-kernel. Downsides with the spark-kernel project are that data visualization isn't quite there yet, early days for documentation and blogs/etc. Upside is that R and Python work beautifully and that the ipython committers are super-helpful. If you are OK with a primarily spark/scala experience, then I suggest you with spark-notebook. Upsides are that the project is a little further along, visualization support is better than spark-kernel (though not as good as iPython with Python) and the committer is awesome with help. Downside is that you won't get R and Python. FWIW: I'm using both at the moment! Hope that helps. *Irfan Ahmad* CTO | Co-Founder | *CloudPhysics* http://www.cloudphysics.com Best of VMworld Finalist Best Cloud Management Award NetworkWorld 10 Startups to Watch EMA Most Notable Vendor On Wed, Mar 18, 2015 at 5:45 PM, davidh dav...@annaisystems.com wrote: hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and scanning through this archive with only moderate success. in other words -- my way of saying sorry if this is answered somewhere obvious and I missed it :-) i've been tasked with figuring out how to connect Notebook, Spark, and Accumulo together. The end user will do her work via notebook. thus far, I've successfully setup a Vagrant image containing Spark, Accumulo, and Hadoop. I was able to use some of the Accumulo example code to create a table populated with data, create a simple program in scala that, when fired off to Spark via spark-submit, connects to accumulo and prints the first ten rows of data in the table. so w00t on that - but now I'm left with more questions: 1) I'm still stuck on what's considered 'best practice' in terms of hooking all this together. Let's say Sally, a user, wants to do some analytic work on her data. She pecks the appropriate commands into notebook and fires them off. how does this get wired together on the back end? Do I, from notebook, use spark-submit to send a job to spark and let spark worry about hooking into accumulo or is it preferable to create some kind of open stream between the two? 2) if I want to extend spark's api, do I need to first submit an endless job via spark-submit that does something like what this gentleman describes http://blog.madhukaraphatak.com/extending-spark-api ? is there an alternative (other than refactoring spark's source) that doesn't involve extending the api via a job submission? ultimately what I'm looking for help locating docs, blogs, etc that may shed some light on this. t/y in advance! d -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/iPython-Notebook-Spark-Accumulo-best-practice-tp22137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org