THanks. Sorry the last section was supposed be
streams.par.foreach { nameAndStream =
nameAndStream._2.foreachRDD { rdd =
df = sqlContext.jsonRDD(rdd)
df.insertInto(stream._1)
}
}
ssc.start()
On Fri, Jul 24, 2015 at 10:39 AM, Dean Wampler deanwamp...@gmail.com
wrote:
You don't
You don't need the par (parallel) versions of the Scala collections,
actually, Recall that you are building a pipeline in the driver, but it
doesn't start running cluster tasks until ssc.start() is called, at which
point Spark will figure out the task parallelism. In fact, you might as
well do the