You would need to use *native* Cassandra API's in each Executor - not org.apache.spark.sql.cassandra.CassandraSQLContext - including to create a separate Cassandra connection on each Executor.
2017-05-28 15:47 GMT-07:00 Abdulfattah Safa <fattah.s...@gmail.com>: > So I can't run SQL queries in Executors ? > > On Sun, May 28, 2017 at 11:00 PM Mark Hamstra <m...@clearstorydata.com> > wrote: > >> You can't do that. SparkContext and SparkSession can exist only on the >> Driver. >> >> On Sun, May 28, 2017 at 6:56 AM, Abdulfattah Safa <fattah.s...@gmail.com> >> wrote: >> >>> How can I use SparkContext (to create Spark Session or Cassandra >>> Sessions) in executors? >>> If I pass it as parameter to the foreach or foreachpartition, then it >>> will have a null value. >>> Shall I create a new SparkContext in each executor? >>> >>> Here is what I'm trying to do: >>> Read a dump directory with millions of dump files as follows: >>> >>> dumpFiles = Directory.listFiles(dumpDirectory) >>> dumpFilesRDD = sparkContext.parallize(dumpFiles, numOfSlices) >>> dumpFilesRDD.foreachPartition(dumpFilePath->parse(dumpFilePath)) >>> . >>> . >>> . >>> >>> In parse(), each dump file is parsed and inserted into database using >>> SparlSQL. In order to do that, SparkContext is needed in the function parse >>> to use the sql() method. >>> >> >>