subject:"Programmatically launch several hundred Spark Streams in parallel"

Re: Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Brandon White

THanks. Sorry the last section was supposed be streams.par.foreach { nameAndStream = nameAndStream._2.foreachRDD { rdd = df = sqlContext.jsonRDD(rdd) df.insertInto(stream._1) } } ssc.start() On Fri, Jul 24, 2015 at 10:39 AM, Dean Wampler deanwamp...@gmail.com wrote: You don't

Re: Programmatically launch several hundred Spark Streams in parallel

2015-07-24 Thread Dean Wampler

You don't need the par (parallel) versions of the Scala collections, actually, Recall that you are building a pipeline in the driver, but it doesn't start running cluster tasks until ssc.start() is called, at which point Spark will figure out the task parallelism. In fact, you might as well do the