Re: Using SparkContext in Executors

Stephen Boesch Sun, 28 May 2017 15:55:49 -0700

You would need to use *native* Cassandra API's in each Executor -
not org.apache.spark.sql.cassandra.CassandraSQLContext
-  including to create a separate Cassandra connection on each Executor.


2017-05-28 15:47 GMT-07:00 Abdulfattah Safa <fattah.s...@gmail.com>:

> So I can't run SQL queries in Executors ?
>
> On Sun, May 28, 2017 at 11:00 PM Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> You can't do that. SparkContext and SparkSession can exist only on the
>> Driver.
>>
>> On Sun, May 28, 2017 at 6:56 AM, Abdulfattah Safa <fattah.s...@gmail.com>
>> wrote:
>>
>>> How can I use SparkContext (to create Spark Session or Cassandra
>>> Sessions) in executors?
>>> If I pass it as parameter to the foreach or foreachpartition, then it
>>> will have a null value.
>>> Shall I create a new SparkContext in each executor?
>>>
>>> Here is what I'm trying to do:
>>> Read a dump directory with millions of dump files as follows:
>>>
>>>     dumpFiles = Directory.listFiles(dumpDirectory)
>>>     dumpFilesRDD = sparkContext.parallize(dumpFiles, numOfSlices)
>>>     dumpFilesRDD.foreachPartition(dumpFilePath->parse(dumpFilePath))
>>>     .
>>>     .
>>>     .
>>>
>>> In parse(), each dump file is parsed and inserted into database using
>>> SparlSQL. In order to do that, SparkContext is needed in the function parse
>>> to use the sql() method.
>>>
>>
>>

Re: Using SparkContext in Executors

Reply via email to